HBase User Group in Paris
Hi all, I was wondering how many HBase users there are in Paris (France...). Would you guys be interested in participating in a Paris-based user group? The idea would be to share HBase practises, with something like a meet-up per quarter. Reply to me directly or on the list, as you prefer. Cheers, Nicolas
Re: Does hbase 0.90 client work with 0.92 server?
Depending on what you're doing with the data, I guess you might have some corner cases, especially after a major compaction. That may be a non trivial piece of code to write (again, it depends on how you use HBase. May be it is actually trivial). And, if you're pessimistic, the regression in 0.92 can be one of those that corrupts the data, so you will need manual data fixes as well during the rollback. It may be simpler to secure the migration by investing more in the testing process (dry/parallel runs). As well, if you find bugs while a release is in progress, it increases your chances to get your bugs fixed... Nicolas On Thu, Sep 27, 2012 at 10:37 AM, Damien Hardy dha...@viadeoteam.comwrote: Actually, I have an old cluster on on prod with 0.90.3 version installed manually and I am working on a CDH4 new cluster deployed full automatic with puppet. While migration is not reversible (according to the pointer given by Jean-Daniel) I would like to keep he old cluster safe by side to be able to revert operation Switching from an old vanilla version to a Cloudera one is an other risk introduced in migrating the actual cluster and I'm not feeling confortable with. My idea is to copy data from old to new and switch clients the new cluster and I am lookin for the best strategy to manage it. A scanner based on timestamp should be enougth to get the last updates after switching (But trying to keep it short). Cheers, -- Damien 2012/9/27 n keywal nkey...@gmail.com You don't have to migrate the data when you upgrade, it's done on the fly. But it seems you want to do something more complex? A kind of realtime replication between two clusters in two different versions? On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello, Corollary, what is the better way to migrate data from a 0.90 cluster to a 0.92 cluser ? Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92 All the data must tansit on a single host where compute the 2 clients. It may be paralalize with mutiple version working with different range scanner maybe but not so easy. Is there a copytable version that should read on 0.90 to write on 0.92 with mapreduce version ? maybe there is some sort of namespace available for Java Classes that we may use 2 version of a same package and go for a mapreduce ? Cheers, -- Damien 2012/9/25 Jean-Daniel Cryans jdcry...@apache.org It's not compatible. Like the guide says[1]: replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you clear out all 0.90.x instances) and restart (You cannot do a rolling restart from 0.90.x to 0.92.x -- you must restart) This includes the client. J-D 1. http://hbase.apache.org/book.html#upgrade0.92 On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh saurabh.agar...@citi.com wrote: Hi, We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked fine in hbase 0.90.4. Our new setup has HBase 0.92 server and hbase 0.90.4 client. And throw following exception when client would like to connect to server. Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me know, Thanks, Saurabh. 12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181, sessionid = 0x139f61977650034, negotiated timeout = 6 java.lang.IllegalArgumentException: Not a host:port pair: ? at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:60) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion
Re: Does hbase 0.90 client work with 0.92 server?
I understood that you were targeting a backup plan to go back from 0.92 -- 0.90 if anything goes wrong? But in any case, it might work, it depends on the data you're working with and the downtime you're ready to accept. It's not simple to ensure you won't miss any operation and to manage the deletes mixed with compactions. Not taking into account the root issue you may have with the source cluster. For example, if you're migrating back because your 0.92 cluster cannot handle the load, adding a map reduce task to do an export world on top of this might bring this extra little workload that will put it down completely. On Fri, Sep 28, 2012 at 11:59 AM, Damien Hardy dha...@viadeoteam.comwrote: And what about hbase 0.90 export distcp hftp://hdfs0.20/ dfs://hdfs1.0/ hbase 0.92 import ? Then switch client (a rest interface), then recorver the few last update with the same approch limiting export on starttime http://hadoop.apache.org/docs/hdfs/current/hftp.html This way could be safe with a minimal downtime ? Cheers, 2012/9/28 n keywal nkey...@gmail.com Depending on what you're doing with the data, I guess you might have some corner cases, especially after a major compaction. That may be a non trivial piece of code to write (again, it depends on how you use HBase. May be it is actually trivial). And, if you're pessimistic, the regression in 0.92 can be one of those that corrupts the data, so you will need manual data fixes as well during the rollback. It may be simpler to secure the migration by investing more in the testing process (dry/parallel runs). As well, if you find bugs while a release is in progress, it increases your chances to get your bugs fixed... Nicolas On Thu, Sep 27, 2012 at 10:37 AM, Damien Hardy dha...@viadeoteam.com wrote: Actually, I have an old cluster on on prod with 0.90.3 version installed manually and I am working on a CDH4 new cluster deployed full automatic with puppet. While migration is not reversible (according to the pointer given by Jean-Daniel) I would like to keep he old cluster safe by side to be able to revert operation Switching from an old vanilla version to a Cloudera one is an other risk introduced in migrating the actual cluster and I'm not feeling confortable with. My idea is to copy data from old to new and switch clients the new cluster and I am lookin for the best strategy to manage it. A scanner based on timestamp should be enougth to get the last updates after switching (But trying to keep it short). Cheers, -- Damien 2012/9/27 n keywal nkey...@gmail.com You don't have to migrate the data when you upgrade, it's done on the fly. But it seems you want to do something more complex? A kind of realtime replication between two clusters in two different versions? On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello, Corollary, what is the better way to migrate data from a 0.90 cluster to a 0.92 cluser ? Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92 All the data must tansit on a single host where compute the 2 clients. It may be paralalize with mutiple version working with different range scanner maybe but not so easy. Is there a copytable version that should read on 0.90 to write on 0.92 with mapreduce version ? maybe there is some sort of namespace available for Java Classes that we may use 2 version of a same package and go for a mapreduce ? Cheers, -- Damien 2012/9/25 Jean-Daniel Cryans jdcry...@apache.org It's not compatible. Like the guide says[1]: replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you clear out all 0.90.x instances) and restart (You cannot do a rolling restart from 0.90.x to 0.92.x -- you must restart) This includes the client. J-D 1. http://hbase.apache.org/book.html#upgrade0.92 On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh saurabh.agar...@citi.com wrote: Hi, We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked fine in hbase 0.90.4. Our new setup has HBase 0.92 server and hbase 0.90.4 client. And throw following exception when client would like to connect to server. Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me know, Thanks, Saurabh. 12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181 , sessionid = 0x139f61977650034, negotiated timeout = 6 java.lang.IllegalArgumentException: Not a host:port pair
Re: Hbase clustering
Hi, I would like to direct you to the reference guide, but I must acknowledge that, well, it's a reference guide, hence not really easy for a plain new start. You should have a look at Lars' blog (and may be buy his book), and especially this entry: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html Some hints however: - the replication occurs at the hdfs level, not the hbase level: hbase writes files that are split in hdfs blocks that are replicated accross the datanodes. If you want to check the replications, you must look at what files are written by hbase and how they have been split in blocks by hdfs and how these blocks have been replicated. That will be in the hdfs interface. As a side note, it's not the easiest thing to learn when you start :-) - The error ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times this is not linked to replication or whatever. It means that second machine cannot find the master. You need to fix this first. (googling checking the logs). Good luck, Nicolas On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku dvrao@gmail.com wrote: How can we verify that the data(tables) is distributed across the cluster?? Is there a way to confirm it that the data is distributed across all the nodes in the cluster.? On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku dvrao@gmail.com wrote: Hi, I am completely new to Hbase. I want to cluster the Hbase on two nodes.I installed hadoop,hbase on the two nodes my conf files are as given below. *cat conf/regionservers * hbase-regionserver1 hbase-master *cat conf/masters * hadoop-namenode * cat conf/slaves * hadoop-datanode1 *vim conf/hdfs-site.xml * ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value2/value descriptionDefault block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namedfs.support.append/name valuetrue/value descriptionDefault block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration * finally my /etc/hosts file is * 127.0.0.1 localhost 127.0.0.1 oc-PowerEdge-R610 10.2.32.48 hbase-master hadoop-namenode 10.240.13.35 hbase-regionserver1 hadoop-datanode1 The above files are identical on both of the machines. The following are the processes that are running on my m/c's when I ran start scripts in hadoop as well as hbase *hadoop-namenode:* HQuorumPeer HMaster Main HRegionServer SecondaryNameNode Jps NameNode JobTracker *hadoop-datanode1:* TaskTracker Jps DataNode -- process information unavailable Main NC HRegionServer I can able to create,list scan tables on the *hadoop-namenode* machine using Hbase shell. But while trying to run the same on the * hadoop-datanode1 *machine I couldn't able to do it as I am getting following error. hbase(main):001:0 list TABLE ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times Here is some help for this command: List all tables in hbase. Optional regular expression parameter could be used to filter the output. Examples: hbase list hbase list 'abc.*' How can I list,scan the tables that are created by the *hadoop-namenode * from the *hadoop-datanode1* machine. Similarly Can I create some tables on *hadoop-datanode1 * can I access them from the *hadoop-namenode * vice-versa as the data is distributed as this is a cluster. -- Thanks Regards, Venkateswara Rao Dokku, Software Engineer,One Convergence Devices Pvt Ltd., Jubille Hills,Hyderabad. -- Thanks Regards, Venkateswara Rao Dokku, Software Engineer,One Convergence Devices Pvt Ltd., Jubille Hills,Hyderabad.
Re: Hbase clustering
You should launch the master only once, on whatever machine you like. Then you will be able to access it from any other machine. Please have a look at the blog I mentioned in my previous mail. On Thu, Sep 27, 2012 at 9:39 AM, Venkateswara Rao Dokku dvrao@gmail.com wrote: I can see that HMaster is not started on the data-node machine when the start scripts in hadoop hbase ran on the hadoop-namenode. My doubt is that,Shall we have to start that master on the hadoop-datanode1 too or the hadoop-datanode1 will access the Hmaster that is running on the hadoop-namenode to create,list,scan tables as the two nodes are in the cluster as namenode datanode. On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote: Hi, I would like to direct you to the reference guide, but I must acknowledge that, well, it's a reference guide, hence not really easy for a plain new start. You should have a look at Lars' blog (and may be buy his book), and especially this entry: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html Some hints however: - the replication occurs at the hdfs level, not the hbase level: hbase writes files that are split in hdfs blocks that are replicated accross the datanodes. If you want to check the replications, you must look at what files are written by hbase and how they have been split in blocks by hdfs and how these blocks have been replicated. That will be in the hdfs interface. As a side note, it's not the easiest thing to learn when you start :-) - The error ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times this is not linked to replication or whatever. It means that second machine cannot find the master. You need to fix this first. (googling checking the logs). Good luck, Nicolas On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku dvrao@gmail.com wrote: How can we verify that the data(tables) is distributed across the cluster?? Is there a way to confirm it that the data is distributed across all the nodes in the cluster.? On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku dvrao@gmail.com wrote: Hi, I am completely new to Hbase. I want to cluster the Hbase on two nodes.I installed hadoop,hbase on the two nodes my conf files are as given below. *cat conf/regionservers * hbase-regionserver1 hbase-master *cat conf/masters * hadoop-namenode * cat conf/slaves * hadoop-datanode1 *vim conf/hdfs-site.xml * ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value2/value descriptionDefault block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namedfs.support.append/name valuetrue/value descriptionDefault block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration * finally my /etc/hosts file is * 127.0.0.1 localhost 127.0.0.1 oc-PowerEdge-R610 10.2.32.48 hbase-master hadoop-namenode 10.240.13.35 hbase-regionserver1 hadoop-datanode1 The above files are identical on both of the machines. The following are the processes that are running on my m/c's when I ran start scripts in hadoop as well as hbase *hadoop-namenode:* HQuorumPeer HMaster Main HRegionServer SecondaryNameNode Jps NameNode JobTracker *hadoop-datanode1:* TaskTracker Jps DataNode -- process information unavailable Main NC HRegionServer I can able to create,list scan tables on the *hadoop-namenode* machine using Hbase shell. But while trying to run the same on the * hadoop-datanode1 *machine I couldn't able to do it as I am getting following error. hbase(main):001:0 list TABLE ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times Here is some help for this command: List all tables in hbase. Optional regular expression parameter could be used to filter the output. Examples: hbase list hbase list 'abc.*' How can I list,scan the tables that are created by the *hadoop-namenode * from the *hadoop-datanode1* machine. Similarly Can I create some tables on *hadoop-datanode1 * can I access them from the *hadoop-namenode * vice-versa as the data is distributed as this is a cluster. -- Thanks Regards
Re: Does hbase 0.90 client work with 0.92 server?
You don't have to migrate the data when you upgrade, it's done on the fly. But it seems you want to do something more complex? A kind of realtime replication between two clusters in two different versions? On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello, Corollary, what is the better way to migrate data from a 0.90 cluster to a 0.92 cluser ? Hbase 0.90 = Client 0.90 = stdout | stdin = client 0.92 = Hbase 0.92 All the data must tansit on a single host where compute the 2 clients. It may be paralalize with mutiple version working with different range scanner maybe but not so easy. Is there a copytable version that should read on 0.90 to write on 0.92 with mapreduce version ? maybe there is some sort of namespace available for Java Classes that we may use 2 version of a same package and go for a mapreduce ? Cheers, -- Damien 2012/9/25 Jean-Daniel Cryans jdcry...@apache.org It's not compatible. Like the guide says[1]: replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you clear out all 0.90.x instances) and restart (You cannot do a rolling restart from 0.90.x to 0.92.x -- you must restart) This includes the client. J-D 1. http://hbase.apache.org/book.html#upgrade0.92 On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh saurabh.agar...@citi.com wrote: Hi, We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked fine in hbase 0.90.4. Our new setup has HBase 0.92 server and hbase 0.90.4 client. And throw following exception when client would like to connect to server. Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me know, Thanks, Saurabh. 12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181, sessionid = 0x139f61977650034, negotiated timeout = 6 java.lang.IllegalArgumentException: Not a host:port pair: ? at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:60) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:179) at org.apache.hadoop.hbase.HBaseTestingUtility.truncateTable(HBaseTestingUtility.java:609) at com.citi.sponge.flume.sink.ELFHbaseSinkTest.testAppend2(ELFHbaseSinkTest.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at
Re: H-base Master/Slave replication
Hi, I think there is a confusion between hbase replication (replication between clusters) and hdfs replication (replication between datanodes). hdfs replication is (more or less) hidden and done for you. Nicolas On Wed, Sep 26, 2012 at 9:20 AM, Venkateswara Rao Dokku dvrao@gmail.com wrote: Hi, I wanted to Cluster Hbase on 2 nodes. I put one of my nodes as hadoop-namenodeas well as hbase-master the other node as hadoop-datanode1 as well as hbase-region-server1. I started hadoop cluster as well as Hbase on the name-node side. They started fine. I created tables it went fine in the master. Now I am trying to replicate the data across the nodes. In some of the sites it is mentioned that, we have to maintain zoo-keeper by our-self. How to do it?? Currently my hbase is maintaining the zoo-keeper. what are the changes I need to do for conf/ files, in order to replicate data between Master/Slave nodes. -- Thanks Regards, Venkateswara Rao Dokku, Software Engineer,One Convergence Devices Pvt Ltd., Jubille Hills,Hyderabad.
Re: RetriesExhaustedWithDetailsException while puting in Table
DoNotRetryIOException means that the error is considered at permanent: it's not a missing regionserver, but for example a table that's not enabled. I would expect a more detailed exception (a caused by or something alike). If it's missing, you should have more info in the regionserver logs. On Wed, Sep 19, 2012 at 11:54 AM, Dhirendra Singh dps...@gmail.com wrote: I am getting this exception while trying to insert entry to the table. the table has its secondary index and its coprocessors defined properly. I suspect this error is because the inserting row didn't had all the columns which were required in the secondary index but not sure. could someone tell me the way to debug this scenario as the exception is also a bit vauge, it actually doesn't tell what went wrong, org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: tserver.corp.nextag.com:60020, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:795) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397) -- Warm Regards, Dhirendra Pratap +91. 9717394713
Re: Performance of scan setTimeRange VS manually doing it
For each file; there is a time range. When you scan/search, the file is skipped if there is no overlap between the file timerange and the timerange of the query. As there are other parameters as well (row distribution, compaction effects, cache, bloom filters, ...) it's difficult to know in advance what's going to happen exactly. But specifying a timerange does no harm for sure, if it matches your functional needs... This said, if you already have the rowkey, the time range is less interesting as you will skip a lot of file already. On Wed, Sep 12, 2012 at 11:52 PM, Tom Brown tombrow...@gmail.com wrote: When I query HBase, I always include a time range. This has not been a problem when querying recent data, but it seems to be an issue when I query older data (a few hours old). All of my row keys include the timestamp as part of the key (this value is the same as the HBase timestamp for the row). I recently tried an experiment where I manually re-seek to the possible row (based on the timestamp as part of the row key) instead of using setTimeRange on my scan object and was amazed to see that there was no degradation for older data. Can someone postulate a theory as to why this might be happening? I'm happy to provide extra data if it will help you theorize... Is there a downside to stopping using setTimeRange? --Tom
Re: Local debugging (possibly with Maven and HBaseTestingUtility?)
Hi, You can use HBase in standalone mode? Cf. http://hbase.apache.org/book.html#standalone_dist? I guess you already tried and it didn't work? Nicolas On Fri, Sep 7, 2012 at 9:57 AM, Jeroen Hoek jer...@lable.org wrote: Hello, We are developing a web-application that uses HBase as database, with Tomcat as application server. Currently, our server-side code can act as a sort of NoSQL abstraction-layer for either HBase or Google AppEngine. HBase is used in production, AppEngine mainly for testing and demo deployments. Our current development setup is centred around Eclipse, and local testing and debugging is done by running the application from Eclipse, which launches the Jetty application server and connects to a local AppEngine database persisted to a single file in the WEB-INF directory. This allows the developers to easily test new features against an existing (local) database that is persisted as long you don't throw away the binary file yourself. We would like to be able to do the same thing with HBase. So far I have seen examples of HBaseTestingUtility being used in unit tests (usually with Maven), but while that covers unit-testing, I have not been able to find a way to run a local, persistent faux-HBase cluster like AppEngine does. Is there a recommended way of doing this? The reason for wanting to be able to test locally like this is to avoid the overhead of running a local VM with HBase or having to connect to a remote test-cluster when developing. Kind regards, Jeroen Hoek
Re: Extremely slow when loading small amount of data from HBase
Hi, With 8 regionservers, yes, you can. Target a few hundreds by default imho. N. On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 tewil...@gmail.com wrote: +HBase users. -- Forwarded message -- From: Dmitriy Ryaboy dvrya...@gmail.com Date: 2012/9/4 Subject: Re: Extremely slow when loading small amount of data from HBase To: u...@pig.apache.org u...@pig.apache.org I think the hbase folks recommend something like 40 regions per node per table, but I might be misremembering something. Have you tried emailing the hbase users list? On Sep 4, 2012, at 3:39 AM, 某因幡 tewil...@gmail.com wrote: After merging ~8000 regions to ~4000 on an 8-node cluster the things is getting better. Should I continue merging? 2012/8/29 Dmitriy Ryaboy dvrya...@gmail.com: Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc) On Aug 27, 2012, at 11:49 PM, 某因幡 tewil...@gmail.com wrote: When I load a range of data from HBase simply using row key range in HBaseStorageHandler, I find that the speed is acceptable when I'm trying to load some tens of millions rows or more, while the only map ends up in a timeout when it's some thousands of rows. What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0. -- language: Chinese, Japanese, English -- language: Chinese, Japanese, English -- language: Chinese, Japanese, English
Re: HBase and unit tests
Hi Cristopher, HBase starts a minicluster for many of its tests because we have a lot of destructive tests. Or the non destructive tests would be impacted by the destructive tests. When writing a client application, you usually don't need to do that: you can rely on the same instance for all your tests. As well, it's useful to write the tests in a way compatible with a real cluster or a pseudo distributed one. Sometimes, when the test fails, you want to have a look at what the code wrote or found in HBase: you won't have this in a mini cluster. And it saves a start. I don't know if there is a blog entry on this; but it's not very difficult to do (but as usual not that easy when you start). I've personally done it with a singleton class + prefixing the table names by a random key (to allow multiple tests in parallel on the same cluster without relying on cleanup) + getProperty to decide between starting a mini cluster or connecting to a cluster. HTH, Nicolas On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi Sonal, Stack and Ulrich! Yes, I should provide more details :$ I reached the links you provided when I was searching for a way to start HBase with JUnit. From default, the only params I have changed are Zookeeper port and the amount of nodes, which is 1 in my case. Based on logs I suspect that most of time are spent with HDFS and that's why I asked if there is a way to start a standalone instance of HBase. The amount of data written at each test case would probably fit in memstore anyway, and table cleansing between each test method is managed by a loop of deletes. At least 15 seconds are spent on starting the mini cluster for each test case. Right now I reminded that I should turn off WAL when running unit tests :-), but this will not reflect on startup time. Thanks!! Best regards, Cristofer De: Ulrich Staudinger [ustaudin...@gmail.com] Enviado: sexta-feira, 31 de agosto de 2012 2:21 Para: user@hbase.apache.org Assunto: Re: HBase and unit tests As a general advice, although you probably do take care of this, instantiate the mini cluster only once in your junit test constructor and not in every test method. at the end of each test, either cleanup your hbase or use a different area per test. best regards, ulrich -- connect on xing or linkedin. sent from my tablet. On 31.08.2012, at 06:46, Stack st...@duboce.net wrote: On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi there! After I started studying HBase, I've searched for open source projects backed by HBase and I found Titan distributed graph database (you probably heard about it). As soon as I read in their documentation that HBase adapter is experimental and suboptimal (disclaimer here: https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to help improving this adapter and since then I made a few changes to improve on running tests (reduced from hours to minutes) and also an improvement on search feature. Now I'm trying to break the dependency on a pre-installed HBase for unit tests and found miniCluster inside HBase tests, but minicluster demands too much time to start and I don't know if tweaking on configs will improve significantly. Is there a way to start a 'lightweight' instance, like programatically starting a standalone instance? How much is 'too much time' Cristofer? Do you want a standalone cluster at all? St.Ack P.S. If digging in this area, you might find the blog post by the sematextians of use: http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
Re: HBase and unit tests
On Fri, Aug 31, 2012 at 2:33 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: For the other adapters (Cassandra, Cassandra + Thrift, Cassandra + Astyanax, etc) they managed to run tests as Internal and External for unit tests and also have a profile for Performance and Concurrent tests, where External and Performance/Concurrent runs over a live database instance and only with Internal tests it is expected to start a database per test case, remaining the same tests as in External. HBase adapter already have External and Performance/Concurrent so I'm trying to provide the Internal set where the objective is to test Titan|HBase interaction. Understood, thanks for sharing the context. And my goal is to achieve better times than Cassandra :-) Singleton seems to be a good option, but I have to check if Maven Surefire can keep same process between JUnit Test Cases. It should be ok with the parameter forkMode=once in surefire. Because Titan work with adapters for different databases and manage table/CF creation when not exists, I think it will not be possible to prefix table names per test without changing some core components of Titan, and it seems to be too invasive to change this now, and deletion is fast enough so we can keep same table. It's useful on an external cluster, as you can't fully rely on the clean up when a test fails nastily, or if you want to analyse the content. It won't be such an issue on a mini cluster, as it's recreated between the test runs. Thanks!! You're welcome. Keep us updated, and tell us if you have issues.
Re: HBase Is So Slow To Save Data?
Hi Bing, You should expect HBase to be slower in the generic case: 1) it writes much more data (see hbase data model), with extra columns qualifiers, timestamps so on. 2) the data is written multiple times: once in the write-ahead-log, once per replica on datanode so on again. 3) there are inter process calls inter machine calls on the critical path. This is the cost of the atomicity, reliability and scalability features. With these features in mind, HBase is reasonably fast to save data on a cluster. On your specific case (without the points 2 3 above), the performance seems to be very bad. You should first look at: - how much is spent in the put vs. preparing the list - do you have garbage collection going on? even swap? - what's the size of your final Array vs. the available memory? Cheers, N. On Wed, Aug 29, 2012 at 4:08 PM, Bing Li lbl...@gmail.com wrote: Dear all, By the way, my HBase is in the pseudo-distributed mode. Thanks! Best regards, Bing On Wed, Aug 29, 2012 at 10:04 PM, Bing Li lbl...@gmail.com wrote: Dear all, According to my experiences, it is very slow for HBase to save data? Am I right? For example, today I need to save data in a HashMap to HBase. It took about more than three hours. However when saving the same HashMap in a file in the text format with the redirected System.out, it took only 4.5 seconds! Why is HBase so slow? It is indexing? My code to save data in HBase is as follows. I think the code must be correct. .. public synchronized void AddVirtualOutgoingHHNeighbors(ConcurrentHashMapString, ConcurrentHashMapString, SetString hhOutNeighborMap, int timingScale) { ListPut puts = new ArrayListPut(); String hhNeighborRowKey; Put hubKeyPut; Put groupKeyPut; Put topGroupKeyPut; Put timingScalePut; Put nodeKeyPut; Put hubNeighborTypePut; for (Map.EntryString, ConcurrentHashMapString, SetString sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet()) { for (Map.EntryString, SetString groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet()) { for (String neighborKey : groupNeighborEntry.getValue()) { hhNeighborRowKey = NeighborStructure.HUB_HUB_NEIGHBOR_ROW + Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + groupNeighborEntry.getKey() + timingScale + neighborKey); hubKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN), Bytes.toBytes(sourceHubGroupNeighborEntry.getKey())); puts.add(hubKeyPut); groupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN), Bytes.toBytes(groupNeighborEntry.getKey())); puts.add(groupKeyPut); topGroupKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN), Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(; puts.add(topGroupKeyPut); timingScalePut = new Put(Bytes.toBytes(hhNeighborRowKey)); timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN), Bytes.toBytes(timingScale)); puts.add(timingScalePut); nodeKeyPut = new Put(Bytes.toBytes(hhNeighborRowKey)); nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN), Bytes.toBytes(neighborKey)); puts.add(nodeKeyPut); hubNeighborTypePut = new Put(Bytes.toBytes(hhNeighborRowKey)); hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY), Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN), Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR)); puts.add(hubNeighborTypePut); } } }
Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Totally randoms (even on keys that do not exist). It worth checking if it matches your real use cases. I expect that read by row key are most of the time on existing rows (as a traditional db relationship or a UI or workflow driven stuff), even if I'm sure it's possible to have something totally different. It's not going to have an impact all the time. But I can easily imagine scenarios with better performances when the row exists vs. does not exist. For example, you have to read more files to check that the row key is really not there. This will be even more true if you're inserting a lot of data simultaneously (i.e. the files won't be major compacted). On the opposite side, bloom filters may be more efficient in this case. But again, I'm not sure they're going to be efficient on random data. It's like compression algorithms: on really random data; they will all have similar bad results. It does not mean they are equivalent, nor useless. I'm working on it ! Thanks, If you can reproduce a 'bad behavior' or a performance issue, we will try to fix it for sure. Have a nice day, N.
Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Hi Adrien, What do you think about that hypothesis ? Yes, there is something fishy to look at here. Difficult to say without more logs as well. Are your gets totally random, or are you doing gets on rows that do exist? That would explain the number of request vs. empty/full regions. It does not explain all what you're seeing however. So if you're not exhausting the system resources, there may be a bug somewhere. If you can reproduce the behaviour on a pseudo distributed cluster it could be interesting, as I understand from you previous mail, you have a single client, and may be a single working server at the end... Nicolas
Re: How to avoid stop-the-world GC for HBase Region Server under big heap size
Hi, For a possible future, there is as well this to monitor: http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html More or less requires JDK 1.7 See HBASE-2039 Cheers, N. On Thu, Aug 23, 2012 at 8:16 AM, J Mohamed Zahoor jmo...@gmail.com wrote: Slab cache might help http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ ./zahoor On Thu, Aug 23, 2012 at 11:36 AM, Gen Liu ge...@zynga.com wrote: Hi, We are running Region Server on big memory machine (70G) and set Xmx=64G. Most heap is used as block cache for random read. Stop-the-world GC is killing the region server, but using less heap (16G) doesn't utilize our machines well. Is there a concurrent or parallel GC option that won't block all threads? Any thought is appreciated. Thanks. Gen Liu
Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Hi Adrien, As well, if you can share the client code (number of threads, regions, is it a set of single get, or are they multi gets, this kind of stuff). Cheers, N. On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Hi Adrien, I would love to see the region server side of the logs while those socket timeouts happen, also check the GC log, but one thing people often hit while doing pure random read workloads with tons of clients is running out of sockets because they are all stuck in CLOSE_WAIT. You can check that by using lsof. There are other discussion on this mailing list about it. J-D On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Hi there, While I'm performing read-intensive benchmarks, I'm seeing storm of CallerDisconnectedException in certain RegionServers. As the documentation says, my client received a SocketTimeoutException (6ms etc...) at the same time. It's always happening and I get very poor read-performances (from 10 to 5000 reads/sc) in a 10 nodes cluster. My benchmark consists in several iterations launching 10, 100 and 1000 Get requests on a given random rowkey with a single CF/qualifier. I'm using HBase 0.94.1 (a few commits before the official stable release) with Hadoop 1.0.3. Bloom filters have been enabled (at the rowkey level). I do not find very clear informations about these exceptions. From the reference guide : (...) you should consider digging in a bit more if you aren't doing something to trigger them. Well... could you help me digging? :-) -- AM.
Re: Problem - Bringing up the HBase cluster
Hi, Please use the user mailing list (added at dest) for this type of questions instead of the dev list (now in bcc). It's a little bit strange to use the full distributed mode with a single region server. Is the Pseudo-distributed mode working? Check the number of datanodes vs. dfs.replication (default 3). If you have less datanodes then dfs.replication value, it won't work properly. Check as well that the region server is connected to the master. Cheers, On Wed, Aug 22, 2012 at 3:16 AM, kbmkumar kbmku...@gmail.com wrote: Hi, I am trying to bring up a HBase cluster with 1 master and 1 one region server. I am using Hadoop 1.0.3 Hbase 0.94.1 Starting the hdfs was straight forward and i could see the namenode up and running successfully. But the problem is with Hbase. I followed all the guidelines given in the Hbase cluster setup (fully distributed mode) and ran the start-hbase.sh It started the Master, Region server and zookeeper (in the region server) as per my configuration. But i am not sure the master is fully functional. When i try to connect hbase shell and create table, it errors out saying PleaseHoldException- Master is initializing In UI HMaster status shows like this *Assigning META region (since 18mins, 39sec ago)* and i see the Hmaster logs are flowing with the following debug prints, the log file is full of below prints, * 2012-08-22 01:08:19,637 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd; serverName=hadoop-datanode1,60020,1345596463277 2012-08-22 01:08:19,638 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd; serverName=hadoop-datanode1,60020,1345596463277 2012-08-22 01:08:19,639 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd; serverName=hadoop-datanode1,60020,1345596463277* Please help me in debugging this. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Problem-Bringing-up-the-HBase-cluster-tp4019948.html Sent from the HBase - Developer mailing list archive at Nabble.com.
Re: Hbase Shell: UnsatisfiedLinkError
Hi, Well the first steps would be: 1) Use the JDK 1.6 from Oracle. 1.7 is not supported yet. 2) Check the content of http://hbase.apache.org/book.html#configuration to set up your first cluster. Worth reading the whole guide imho. 3) Start with the last released version (.94), except if you have a good reason to use the .90 of course. 4) Use the user mailing list for this type of questions and not the dev one. :-). I kept dev in bcc. Good luck, N. On Wed, Aug 22, 2012 at 12:25 PM, o brbrs obr...@gmail.com wrote: Hi, I'm new at hbase. I installed Hadoop 1.0.3 and Hbase 0.90.6 with Java 1.7.0 on Ubuntu 12.04. When I run hbase shell command, this error occures: $ /usr/local/hbase/bin/hbase shell java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: Could not locate stub library in jar file. Tried [jni/ı386-Linux/libjffi-1.0.so, /jni/ı386-Linux/libjffi-1.0.so] at com.kenai.jffi.Foreign$InValidInstanceHolder.getForeign(Foreign.java:90) at com.kenai.jffi.Foreign.getInstance(Foreign.java:95) at com.kenai.jffi.Library.openLibrary(Library.java:151) at com.kenai.jffi.Library.getCachedInstance(Library.java:125) at com.kenai.jaffl.provider.jffi.Library.loadNativeLibraries(Library.java:66) at com.kenai.jaffl.provider.jffi.Library.getNativeLibraries(Library.java:56) at com.kenai.jaffl.provider.jffi.Library.getSymbolAddress(Library.java:35) at com.kenai.jaffl.provider.jffi.Library.findSymbolAddress(Library.java:45) at com.kenai.jaffl.provider.jffi.AsmLibraryLoader.generateInterfaceImpl(AsmLibraryLoader.java:188) at com.kenai.jaffl.provider.jffi.AsmLibraryLoader.loadLibrary(AsmLibraryLoader.java:110) . What is the reason of this error? Please help. Thanks... -- ... Obrbrs
Re: Problem - Bringing up the HBase cluster
If you have a single datanode with a replication of two, it will (basically) won't work, as it will try to replicate the blocks on two datanodes while there is only one available. Note that I'm speaking about datanodes (i.e. hdfs) and not region servers (i.e. hbase). pastebin the full logs with the region server, may be someone will have an idea of the root issue. But I think it's safer to start with the pseudo distributed, it's easier to setup and it's documented. A distributed config with a single node is not really standard, it's better to start with the easiest path imho. On Wed, Aug 22, 2012 at 5:43 PM, Jothikumar Ekanath kbmku...@gmail.com wrote: Hi, Thanks for the response, sorry i put this email in the dev space. My data replication is 2. and yes the region and master server connectivity is good Initially i started with 4 data nodes and 1 master, i faced the same problem. So i reduced the data nodes to 1 and wanted to test it. I see the same issue. I haven't tested the pseudo distribution mode, i can test that. But my objective is to test the full distributed mode and do some testing. I can send my configuration for review. Please let me know if i am missing any basic setup configuration. On Wed, Aug 22, 2012 at 12:00 AM, N Keywal nkey...@gmail.com wrote: Hi, Please use the user mailing list (added at dest) for this type of questions instead of the dev list (now in bcc). It's a little bit strange to use the full distributed mode with a single region server. Is the Pseudo-distributed mode working? Check the number of datanodes vs. dfs.replication (default 3). If you have less datanodes then dfs.replication value, it won't work properly. Check as well that the region server is connected to the master. Cheers, On Wed, Aug 22, 2012 at 3:16 AM, kbmkumar kbmku...@gmail.com wrote: Hi, I am trying to bring up a HBase cluster with 1 master and 1 one region server. I am using Hadoop 1.0.3 Hbase 0.94.1 Starting the hdfs was straight forward and i could see the namenode up and running successfully. But the problem is with Hbase. I followed all the guidelines given in the Hbase cluster setup (fully distributed mode) and ran the start-hbase.sh It started the Master, Region server and zookeeper (in the region server) as per my configuration. But i am not sure the master is fully functional. When i try to connect hbase shell and create table, it errors out saying PleaseHoldException- Master is initializing In UI HMaster status shows like this *Assigning META region (since 18mins, 39sec ago)* and i see the Hmaster logs are flowing with the following debug prints, the log file is full of below prints, * 2012-08-22 01:08:19,637 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd; serverName=hadoop-datanode1,60020,1345596463277 2012-08-22 01:08:19,638 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd; serverName=hadoop-datanode1,60020,1345596463277 2012-08-22 01:08:19,639 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd; serverName=hadoop-datanode1,60020,1345596463277* Please help me in debugging this. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Problem-Bringing-up-the-HBase-cluster-tp4019948.html Sent from the HBase - Developer mailing list archive at Nabble.com.
Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id
Hi, What are your queries exactly? What's the HBase version? The mechanism is: - There is a location cache, per HConnection, on the client - The client first tries the region server in its cache - if it fails, the client removes this entry from the cache and enters the retry loop - there is a limited amount of retries and a sleep between the retries - most of the times, the client will connect to meta to get the new location When there are multiple queries, before HBASE-5924, the errors will be analyzed after the other regions servers has returned as well. It could be an explanation. HBASE-5877 exists as well, but only for moves, not for splits... Cheers, N. On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010 deanforwever2...@gmail.com wrote: on the region server's log :2012-08-10 11:49:50,796 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b. after region split, client didnt get result after timeout setting(1.5 second),then the task is canceled by my program, so the HConnectionManager didnt delete the cachedLocation; the client still query the old region id which is no more exists And more, part of my processes updated the region location info, part not.I'm sure the network is fine; how to fix the problem?why does it need so long time to detect the new regions?
Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id
If it's a single row, I would expect the server to return the error immediately. Then you will have the sleep I was mentioning previously, but the cache should be cleaned before the sleep... On Fri, Aug 10, 2012 at 1:32 PM, deanforwever2010 deanforwever2...@gmail.com wrote: hi, Keywal my hbase version is 0.94, my query is just to get limited columns of a row, I make a callable task of 1.5 seconds, so maybe it didnot fail but canceled by my process,so the region cache didnot clear after many requests happened. my question is why should it take so long time for failure? and it behave different between my servers, and there is no problem with network. 2012/8/10 N Keywal nkey...@gmail.com Hi, What are your queries exactly? What's the HBase version? The mechanism is: - There is a location cache, per HConnection, on the client - The client first tries the region server in its cache - if it fails, the client removes this entry from the cache and enters the retry loop - there is a limited amount of retries and a sleep between the retries - most of the times, the client will connect to meta to get the new location When there are multiple queries, before HBASE-5924, the errors will be analyzed after the other regions servers has returned as well. It could be an explanation. HBASE-5877 exists as well, but only for moves, not for splits... Cheers, N. On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010 deanforwever2...@gmail.com wrote: on the region server's log :2012-08-10 11:49:50,796 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b. after region split, client didnt get result after timeout setting(1.5 second),then the task is canceled by my program, so the HConnectionManager didnt delete the cachedLocation; the client still query the old region id which is no more exists And more, part of my processes updated the region location info, part not.I'm sure the network is fine; how to fix the problem?why does it need so long time to detect the new regions?
Re: HBaseTestingUtility on windows
Hi Mohit, For simple cases, it works for me for hbase 0.94 at least. But I'm not sure it works for all features. I've never tried to run hbase unit tests on windows for example. N. On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to run mini cluster using HBaseTestingUtility Class from hbase tests on windows, but I get bash command error. Is it not possible to run this utility class on windows? I followed this example: http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
Re: hbase can't start:KeeperErrorCode = NoNode for /hbase
Hi, The issue is in ZooKeeper, not directly HBase. It seems its data is corrupted, so it cannot start. You can configure zookeeper to another data directory to make it start. N. On Thu, Aug 2, 2012 at 11:11 AM, abloz...@gmail.com abloz...@gmail.com wrote: I even move /hbase to hbase2, and create a new dir /hbase1, modify hbase-site.xml to: property namehbase.rootdir/name valuehdfs://Hadoop48:54310/hbase1/value /property property namezookeeper.znode.parent/name value/hbase1/value /property But the error message still KeeperErrorCode = NoNode for /hbase Any body can give any help? Thanks! Andy zhou 2012/8/2 abloz...@gmail.com abloz...@gmail.com hi all, After I killed all java process, I can't restart hbase, it reports: Hadoop46: starting zookeeper, logging to /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop46.out Hadoop47: starting zookeeper, logging to /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop47.out Hadoop48: starting zookeeper, logging to /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop48.out Hadoop46: java.lang.RuntimeException: Unable to run quorum server Hadoop46: at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) Hadoop46: at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) Hadoop46: at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) Hadoop46: at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74) Hadoop46: at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64) Hadoop46: Caused by: java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /hbase Hadoop46: at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) Hadoop46: at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) Hadoop46: at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) Hadoop47: java.lang.RuntimeException: Unable to run quorum server Hadoop47: at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) Hadoop47: at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) Hadoop47: at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) Hadoop47: at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74) Hadoop47: at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64) Hadoop47: Caused by: java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /hbase Hadoop47: at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) Hadoop47: at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) Hadoop47: at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) while Hadoop48 is HMaster. but hdfs://xxx/hbase is existed. [zhouhh@Hadoop47 ~]$ hadoop fs -ls /hbase Found 113 items drwxr-xr-x - zhouhh supergroup 0 2012-07-03 19:24 /hbase/-ROOT- drwxr-xr-x - zhouhh supergroup 0 2012-07-03 19:24 /hbase/.META. ... So what's the problem? Thanks! andy
Re: Region Server failure due to remote data node errors
Hi Jay, Yes, the whole log would be interesting, plus the logs of the datanode on the same box as the dead RS. What's your hbase hdfs versions? The RS should be immune to hdfs errors. There are known issues (see HDFS-3701), but it seems you have something different... This: java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949 remote=/10.128.204.225:50010] Seems to say that the error was between the datanode on the same box as the RS? Nicolas On Mon, Jul 30, 2012 at 6:43 PM, Jay T jay.pyl...@gmail.com wrote: A couple of our region servers (in a 16 node cluster) crashed due to underlying Data Node errors. I am trying to understand how errors on remote data nodes impact other region server processes. *To briefly describe what happened: * 1) Cluster was in operation. All 16 nodes were up, reads and writes were happening extensively. 2) Nodes 7 and 8 were shutdown for maintenance. (No graceful shutdown DN and RS service were running and the power was just pulled out) 3) Nodes 2 and 5 flushed and DFS client started reporting errors. From the log it seems like DFS blocks were being replicated to the nodes that were shutdown (7 and 8) and since replication could not go through successfully DFS client raised errors on 2 and 5 and eventually the RS itself died. The question I am trying to get an answer for is : Is a Region Server immune from remote data node errors (that are part of the replication pipeline) or not. ? * Part of the Region Server Log:* (Node 5) 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.128.204.225:50010 java.io.IOException: Bad connect ack with firstBadLink as 10.128.204.228:50010 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-316956372096761177_489798 2012-07-26 18:53:15,246 INFO org.apache.hadoop.hdfs.DFSClient: Excluding datanode 10.128.204.228:50010 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO DeleteFamily was added to HFile (hdfs://Node101:8020/hbase/table/754de060 c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da124) 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=4046717645, memsize=256.5m, into tmp file hdfs://Node101:8020/hbase/table/754de0 60c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da1242012-07-26 18:53:16,907 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/.tmp/26f5c d1fb2cb4547972a31073d2da124 to hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2da124 2012-07-26 18:53:16,921 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2d a124, entries=1137956, sequenceid=4046717645, filesize=13.2m2012-07-26 18:53:32,048 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 15000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949 remote=/10.128.204.225:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2857) 2012-07-26 18:53:32,049 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_5116092240243398556_489796 bad datanode[0] 10.128.204.225:50010 2012-07-26 18:53:32,049 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_5116092240243398556_489796 in pipeline 10.128.204.225:50010, 10.128.204.221:50010, 10.128.204.227:50010: bad datanode 10.128.204.225:50010 I can pastebin the entire log but this is when things started going wrong for Node 5 and eventually shutdown hook for RS started and the RS was shutdown. Any help in troubleshooting this is greatly appreciated. Thanks, Jay
Re: Region Server failure due to remote data node errors
Hi Jay, As you said aldready, the pipeline for blk_5116092240243398556_489796 contains only dead nodes, and this is likely the cause for the wrong behavior. This block is used by a hlog file, created just before the error. I don't get why there are 3 nodes in the pipeline, I would expect only 2. Do you have a specific setting for dfs.replication? Log files are specific, HBase checks that the replication really occurs by checking the replication count, and close them if it's not ok. But it seems that all the nodes are dead from the start, and this could be ill-managed in HBase. Reproducing this may be difficult, but should be possible. Then the region server is stopped, but I didn't see in the logs what was the path for this, so it's surprising to say the least. After this, all the errors on 'already closed are not that critical imho: the close will fail as hdfs closes the file when it cannot recover from an error. I guess your question is still opened. But from what I see it could be a HBase bug. I will be interested to know the conclusions of your analysis... Nicolas On Mon, Jul 30, 2012 at 8:01 PM, Jay T jay.pyl...@gmail.com wrote: Thanks for the quick reply Nicolas. We are using HBase 0.94 on Hadoop 1.0.3. I have uploaded the logs here: Region Server log: http://pastebin.com/QEQ22UnU Data Node log: http://pastebin.com/DF0JNL8K Appreciate your help in figuring this out. Thanks, Jay On 7/30/12 1:02 PM, N Keywal wrote: Hi Jay, Yes, the whole log would be interesting, plus the logs of the datanode on the same box as the dead RS. What's your hbase hdfs versions? The RS should be immune to hdfs errors. There are known issues (see HDFS-3701), but it seems you have something different... This: java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949 remote=/10.128.204.225:50010] Seems to say that the error was between the datanode on the same box as the RS? Nicolas On Mon, Jul 30, 2012 at 6:43 PM, Jay Tjay.pyl...@gmail.com wrote: A couple of our region servers (in a 16 node cluster) crashed due to underlying Data Node errors. I am trying to understand how errors on remote data nodes impact other region server processes. *To briefly describe what happened: * 1) Cluster was in operation. All 16 nodes were up, reads and writes were happening extensively. 2) Nodes 7 and 8 were shutdown for maintenance. (No graceful shutdown DN and RS service were running and the power was just pulled out) 3) Nodes 2 and 5 flushed and DFS client started reporting errors. From the log it seems like DFS blocks were being replicated to the nodes that were shutdown (7 and 8) and since replication could not go through successfully DFS client raised errors on 2 and 5 and eventually the RS itself died. The question I am trying to get an answer for is : Is a Region Server immune from remote data node errors (that are part of the replication pipeline) or not. ? * Part of the Region Server Log:* (Node 5) 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 10.128.204.225:50010 java.io.IOException: Bad connect ack with firstBadLink as 10.128.204.228:50010 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-316956372096761177_489798 2012-07-26 18:53:15,246 INFO org.apache.hadoop.hdfs.DFSClient: Excluding datanode 10.128.204.228:50010 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO DeleteFamily was added to HFile (hdfs://Node101:8020/hbase/table/754de060 c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da124) 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=4046717645, memsize=256.5m, into tmp file hdfs://Node101:8020/hbase/table/754de0 60c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da1242012-07-26 18:53:16,907 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/.tmp/26f5c d1fb2cb4547972a31073d2da124 to hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2da124 2012-07-26 18:53:16,921 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2d a124, entries=1137956, sequenceid=4046717645, filesize=13.2m2012-07-26 18:53:32,048 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 15000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949 remote=/10.128.204.225:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146
Re: Lowering HDFS socket timeouts
Hi Bryan, It's a difficult question, because dfs.socket.timeout is used all over the place in hdfs. I'm currently documenting this. Especially: - it's used for connections between datanodes, and not only for connections between hdfs clients hdfs datanodes. - It's also used for the two types of datanodes connection (ports beeing 50010 50020 by default). - It's used as a connect timeout, but as well as a read timeout (socket is connected, but the application does not write for a while) - It's used with various extensions, so when your seeing stuff like 69000 or 66000 it's often the same setting timeout + 3s (hardcoded) * #replica For a single datanode issue, with everything going well, it will make the cluster much more reactive: hbase will go to another node immediately instead of waiting. But it will make it much more sensitive to gc and network issues. If you have a major hardware issue, something like 10% of your cluster going down, this setting will multiply the number of retries, and will add a lot of workload to your already damaged cluster, and this could make the things worse. This said, I think we will need to make it shorter sooner or later, so if you do it on your cluster, it will be helpful... N. On Tue, Jul 17, 2012 at 7:11 PM, Bryan Beaudreault bbeaudrea...@gmail.com wrote: Today I needed to restart one of my region servers, and did so without gracefully shutting down the datanode. For the next 1-2 minutes we had a bunch of failed queries from various other region servers trying to access that datanode. Looking at the logs, I saw that they were all socket timeouts after 6 milliseconds. We use HBase mostly as an online datastore, with various APIs powering various web apps and external consumers. Writes come from both the API in some cases, but we have continuous hadoop jobs feeding data in as well. Since we have web app consumers, this 60 second timeout seems unreasonably long. If a datanode goes down, ideally the impact would be much smaller than that. I want to lower the dfs.socket.timeout to something like 5-10 seconds, but do not know the implications of this. In googling I did not find much precedent for this, but I did find some people talking about upping the timeout to much longer than 60 seconds. Is it generally safe to lower this timeout dramatically if you want faster failures? Are there any downsides to this? Thanks -- Bryan Beaudreault
Re: Lowering HDFS socket timeouts
I don't know. The question is mainly for the read time out: you will connect to the ipc.Client with a read timeout of let say 10s. Server side the implementation may do something with another server, with a connect read timeout of 60s. So if you have: HBase -- live DN -- dead DN The timeout will be triggered in HBase while the live DN is still waiting for the answer from the dead dn. It could even retry on another node. On paper, this should work, as this could happen in real life without changing the dfs timeouts.. And may be this case does not even exist. But as the extension mechanism is designed to add some extra seconds, it could exist for this reason or something alike. Worth asking on the hdfs mailing list I would say. On Wed, Jul 18, 2012 at 4:28 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: Thanks for the response, N. I could be wrong here, but since this problem is in the HDFS client code, couldn't I set this dfs.socket.timeout in my hbase-site.xml and it would only affect hbase connections to hdfs? I.e. we wouldn't have to worry about affecting connections between datanodes, etc. -- Bryan Beaudreault On Wednesday, July 18, 2012 at 4:38 AM, N Keywal wrote: Hi Bryan, It's a difficult question, because dfs.socket.timeout is used all over the place in hdfs. I'm currently documenting this. Especially: - it's used for connections between datanodes, and not only for connections between hdfs clients hdfs datanodes. - It's also used for the two types of datanodes connection (ports beeing 50010 50020 by default). - It's used as a connect timeout, but as well as a read timeout (socket is connected, but the application does not write for a while) - It's used with various extensions, so when your seeing stuff like 69000 or 66000 it's often the same setting timeout + 3s (hardcoded) * #replica For a single datanode issue, with everything going well, it will make the cluster much more reactive: hbase will go to another node immediately instead of waiting. But it will make it much more sensitive to gc and network issues. If you have a major hardware issue, something like 10% of your cluster going down, this setting will multiply the number of retries, and will add a lot of workload to your already damaged cluster, and this could make the things worse. This said, I think we will need to make it shorter sooner or later, so if you do it on your cluster, it will be helpful... N. On Tue, Jul 17, 2012 at 7:11 PM, Bryan Beaudreault bbeaudrea...@gmail.com (mailto:bbeaudrea...@gmail.com) wrote: Today I needed to restart one of my region servers, and did so without gracefully shutting down the datanode. For the next 1-2 minutes we had a bunch of failed queries from various other region servers trying to access that datanode. Looking at the logs, I saw that they were all socket timeouts after 6 milliseconds. We use HBase mostly as an online datastore, with various APIs powering various web apps and external consumers. Writes come from both the API in some cases, but we have continuous hadoop jobs feeding data in as well. Since we have web app consumers, this 60 second timeout seems unreasonably long. If a datanode goes down, ideally the impact would be much smaller than that. I want to lower the dfs.socket.timeout to something like 5-10 seconds, but do not know the implications of this. In googling I did not find much precedent for this, but I did find some people talking about upping the timeout to much longer than 60 seconds. Is it generally safe to lower this timeout dramatically if you want faster failures? Are there any downsides to this? Thanks -- Bryan Beaudreault
Re: Maximum number of tables ?
Hi, There is no real limits as far as I know. As you will have one region per table (at least :-), the number of region will be something to monitor carefully if you need thousands of table. See http://hbase.apache.org/book.html#arch.regions.size. Don't forget that you can add as many column as you want, and that an empty cell cost nothing. For example, a class hierarchy is often mapped to multiple tables in a RDBMS, while in HBase having a single table for the same hierarchy makes much more sense. Moreover, there is no transaction between tables, so sometimes a 'uml composition' will go to a single table. And so on. N. On Fri, Jul 13, 2012 at 9:04 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Hi there, I read some good practices about number of columns / column families, but nothing about the number of tables. What if I need to spread my data among hundred or thousand (big) tables ? What should I care about ? I guess I should keep a tight number of storeFiles per RegionServer ? -- Adrien Mogenet http://www.mogenet.me
Re: HBaseClient recovery from .META. server power down
Thanks for the jira. The client can be connected to multiple RS, depending on the rows is working on. So yes it's initial, but it's a dynamic initial :-). This said there is a retry on error... On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma svarma...@gmail.com wrote: I will create a JIRA ticket ... The only side-effect I could think of is ... if a RS is having a GC of a few seconds, any _new_ client trying to connect would get connect failures. So ... the _initial_ connection to the RS is what would suffer from a super-low setting of the ipc.socket.timeout. This was my read of the code. So - was hoping to get a confirmation if this is the only side effect. Again - this is on the client side - I wouldn't risk doing this on the cluster side ... --Suraj On Mon, Jul 9, 2012 at 9:44 AM, N Keywal nkey...@gmail.com wrote: Hi, What you're describing -the 35 minutes recovery time- seems to match the code. And it's a bug (still there on trunk). Could you please create a jira for it? If you have the logs it even better. Lowering the ipc.socket.timeout seems to be an acceptable partial workaround. Setting it to 10s seems ok to me. Lower than this... I don't know. N. On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma svarma...@gmail.com wrote: Hello: I'd like to get advice on the below strategy of decreasing the ipc.socket.timeout configuration on the HBase Client side ... has anyone tried this? Has anyone had any issues with configuring this lower than the default 20s? Thanks, --Suraj On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma svarma...@gmail.com wrote: By power down below, I mean powering down the host with the RS that holds the .META. table. (So - essentially, the host IP is unreachable and the RS/DN is gone.) Just wanted to clarify my below steps ... --S On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma svarma...@gmail.com wrote: Hello: We've been doing some failure scenario tests by powering down a .META. holding region server host and while the HBase cluster itself recovers and reassigns the META region and other regions (after we tweaked down the default timeouts), our client apps using HBaseClient take a long time to recover. hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23 Process: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). What we see: 1) Client threads spike up to maxThread size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no calls are being serviced - they are all just backed up on a synchronized method ... 2) Essentially, all the client app threads queue up behind the HBaseClient.setupIOStreams method in oahh.ipc.HBaseClient (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#312). http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); Essentially, the thread which got the lock would try to connect to the dead RS (till socket times out), retrying, and then the next thread gets in and so forth. Solution tested: --- So - the ipc.HBaseClient code shows ipc.socket.timeout default is 20s. We dropped this down to a low number (1000 ms, 100 ms, etc) and the recovery was much faster (in a couple of minutes). So - we're thinking of setting the HBase client side hbase-site.xml with an ipc.socket.timeout of 100ms. Looking at the code, it appears that this is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. Am I reading the code right? Any thoughts on how whether this is too low for comfort? (Our internal tests did not show any errors during normal operation related to timeouts etc ... but, I just wanted to run this by the experts.). Note that this above timeout tweak is only on the HBase client side. Thanks, --Suraj
Re: HBaseClient recovery from .META. server power down
I expect (without double checking the path in the code ;-) that the code in HConnectionManager will retry. On Tue, Jul 10, 2012 at 7:22 PM, Suraj Varma svarma...@gmail.com wrote: Yes. On the maxRetries, though ... I saw the code (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#677) show this.maxRetries = conf.getInt(hbase.ipc.client.connect.max.retries, 0); So - looks like by default, the maxRetries is set to 0? So ... there is effectively no retry (i.e. it is fail-fast) --Suraj On Tue, Jul 10, 2012 at 10:12 AM, N Keywal nkey...@gmail.com wrote: Thanks for the jira. The client can be connected to multiple RS, depending on the rows is working on. So yes it's initial, but it's a dynamic initial :-). This said there is a retry on error... On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma svarma...@gmail.com wrote: I will create a JIRA ticket ... The only side-effect I could think of is ... if a RS is having a GC of a few seconds, any _new_ client trying to connect would get connect failures. So ... the _initial_ connection to the RS is what would suffer from a super-low setting of the ipc.socket.timeout. This was my read of the code. So - was hoping to get a confirmation if this is the only side effect. Again - this is on the client side - I wouldn't risk doing this on the cluster side ... --Suraj On Mon, Jul 9, 2012 at 9:44 AM, N Keywal nkey...@gmail.com wrote: Hi, What you're describing -the 35 minutes recovery time- seems to match the code. And it's a bug (still there on trunk). Could you please create a jira for it? If you have the logs it even better. Lowering the ipc.socket.timeout seems to be an acceptable partial workaround. Setting it to 10s seems ok to me. Lower than this... I don't know. N. On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma svarma...@gmail.com wrote: Hello: I'd like to get advice on the below strategy of decreasing the ipc.socket.timeout configuration on the HBase Client side ... has anyone tried this? Has anyone had any issues with configuring this lower than the default 20s? Thanks, --Suraj On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma svarma...@gmail.com wrote: By power down below, I mean powering down the host with the RS that holds the .META. table. (So - essentially, the host IP is unreachable and the RS/DN is gone.) Just wanted to clarify my below steps ... --S On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma svarma...@gmail.com wrote: Hello: We've been doing some failure scenario tests by powering down a .META. holding region server host and while the HBase cluster itself recovers and reassigns the META region and other regions (after we tweaked down the default timeouts), our client apps using HBaseClient take a long time to recover. hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23 Process: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). What we see: 1) Client threads spike up to maxThread size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no calls are being serviced - they are all just backed up on a synchronized method ... 2) Essentially, all the client app threads queue up behind the HBaseClient.setupIOStreams method in oahh.ipc.HBaseClient (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#312). http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); Essentially, the thread which got the lock would try to connect to the dead RS (till socket times out), retrying, and then the next thread gets in and so forth. Solution tested: --- So - the ipc.HBaseClient code shows ipc.socket.timeout default is 20s. We dropped this down to a low number (1000 ms, 100 ms, etc) and the recovery was much faster (in a couple of minutes). So - we're thinking of setting the HBase client side hbase-site.xml with an ipc.socket.timeout of 100ms. Looking at the code, it appears that this is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. Am I reading the code right? Any thoughts on how whether this is too low for comfort? (Our internal tests did not show any errors during normal operation related to timeouts etc ... but, I just wanted to run this by the experts.). Note that this above
Re: HBaseClient recovery from .META. server power down
Hi, What you're describing -the 35 minutes recovery time- seems to match the code. And it's a bug (still there on trunk). Could you please create a jira for it? If you have the logs it even better. Lowering the ipc.socket.timeout seems to be an acceptable partial workaround. Setting it to 10s seems ok to me. Lower than this... I don't know. N. On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma svarma...@gmail.com wrote: Hello: I'd like to get advice on the below strategy of decreasing the ipc.socket.timeout configuration on the HBase Client side ... has anyone tried this? Has anyone had any issues with configuring this lower than the default 20s? Thanks, --Suraj On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma svarma...@gmail.com wrote: By power down below, I mean powering down the host with the RS that holds the .META. table. (So - essentially, the host IP is unreachable and the RS/DN is gone.) Just wanted to clarify my below steps ... --S On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma svarma...@gmail.com wrote: Hello: We've been doing some failure scenario tests by powering down a .META. holding region server host and while the HBase cluster itself recovers and reassigns the META region and other regions (after we tweaked down the default timeouts), our client apps using HBaseClient take a long time to recover. hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23 Process: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). What we see: 1) Client threads spike up to maxThread size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no calls are being serviced - they are all just backed up on a synchronized method ... 2) Essentially, all the client app threads queue up behind the HBaseClient.setupIOStreams method in oahh.ipc.HBaseClient (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#312). http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); Essentially, the thread which got the lock would try to connect to the dead RS (till socket times out), retrying, and then the next thread gets in and so forth. Solution tested: --- So - the ipc.HBaseClient code shows ipc.socket.timeout default is 20s. We dropped this down to a low number (1000 ms, 100 ms, etc) and the recovery was much faster (in a couple of minutes). So - we're thinking of setting the HBase client side hbase-site.xml with an ipc.socket.timeout of 100ms. Looking at the code, it appears that this is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. Am I reading the code right? Any thoughts on how whether this is too low for comfort? (Our internal tests did not show any errors during normal operation related to timeouts etc ... but, I just wanted to run this by the experts.). Note that this above timeout tweak is only on the HBase client side. Thanks, --Suraj
Re: distributed log splitting aborted
Hi Cyril, BTW, have you checked dfs.datanode.max.xcievers and ulimit -n? When underconfigured they can cause this type of errors, even if it seems it's not the case here... Cheers, N. On Fri, Jul 6, 2012 at 11:31 AM, Cyril Scetbon cyril.scet...@free.fr wrote: The file is now missing but I have tried with another one and you can see the error : shell hdfs dfs -ls /hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446 Found 1 items -rw-r--r-- 4 hbase supergroup 0 2012-07-04 17:06 /hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446 shell hdfs dfs -cat /hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446 12/07/06 09:27:51 WARN hdfs.DFSClient: Last block locations not available. Datanodes might not have reported blocks completely. Will retry for 3 times 12/07/06 09:27:55 WARN hdfs.DFSClient: Last block locations not available. Datanodes might not have reported blocks completely. Will retry for 2 times 12/07/06 09:27:59 WARN hdfs.DFSClient: Last block locations not available. Datanodes might not have reported blocks completely. Will retry for 1 times cat: Could not obtain the last block locations. I'm using hadoop 2.0 from Cloudera package (CDH4) with hbase 0.92.1 Regards Cyril SCETBON On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote: Interesting... Can you read the file? Try a hadoop dfs -cat on it and see if it goes to the end of it. It could also be useful to see a bigger portion of the master log, for all I know maybe it handles it somehow and there's a problem elsewhere. Finally, which Hadoop version are you using? Thx, J-D On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon cyril.scet...@free.fr wrote: yes : /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971 I did a fsck and here is the report : Status: HEALTHY Total size:618827621255 B (Total open files size: 868 B) Total dirs:4801 Total files: 2825 (Files currently being written: 42) Total blocks (validated): 11479 (avg. block size 53909541 B) (Total open file blocks (not validated): 41) Minimally replicated blocks: 11479 (100.0 %) Over-replicated blocks:1 (0.008711561 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:4 Average block replication: 4.873 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Number of data-nodes: 12 Number of racks: 1 FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds The filesystem under path '/hbase' is HEALTHY Cyril SCETBON Cyril SCETBON On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote: Does this file really exist in HDFS? hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711 If so, did you run fsck in HDFS? It would be weird if HDFS doesn't report anything bad but somehow the clients (like HBase) can't read it. J-D On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon cyril.scet...@free.fr wrote: Hi, I can nolonger start my cluster correctly and get messages like http://pastebin.com/T56wrJxE (taken on one region server) I suppose Hbase is not done for being stopped but only for having some nodes going down ??? HDFS is not complaining, it's only HBase that can't start correctly :( I suppose some data has not been flushed and it's not really important for me. Is there a way to fix theses errors even if I will lose data ? thanks Cyril SCETBON
Re: Hmaster and HRegionServer disappearance reason to ask
Hi, It's a ZK expiry on sunday 1st. Root cause could be the leap second bug? N. On Thu, Jul 5, 2012 at 8:59 AM, lztaomin lztao...@163.com wrote: HI ALL My HBase group a total of 3 machine, Hadoop HBase mounted in the same machine, zookeeper using HBase own. Operation 3 months after the reported abnormal as follows. Cause hmaster and HRegionServer processes are gone. Please help me. Thanks The following is a log ABORTING region server serverName=datanode1,60020,1325326435553, load=(requests=332, regions=188, usedHeap=2741, maxHeap=8165): regionserver:60020-0x3488dec38a02b1 regionserver:60020-0x3488dec38a02b1 received expired from ZooKeeper, aborting Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 2012-07-01 13:45:38,707 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for datanode1,60020,1325326435553 2012-07-01 13:45:38,756 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 32 hlog(s) in hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of 32: hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352, length=5671397 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering file hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352 2012-07-01 13:45:39,766 INFO org.apache.hadoop.hbase.util.FSUtils: Finished lease recover attempt for hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352 2012-07-01 13:45:39,880 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- HDFS-200 2012-07-01 13:45:39,925 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- HDFS-200 ABORTING region server serverName=datanode2,60020,1325146199444, load=(requests=614, regions=189, usedHeap=3662, maxHeap=8165): regionserver:60020-0x3488dec38a0002 regionserver:60020-0x3488dec38a0002 received expired from ZooKeeper, aborting Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 2012-07-01 13:24:10,308 INFO org.apache.hadoop.hbase.util.FSUtils: Finished lease recover attempt for hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341075090535 2012-07-01 13:24:10,918 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 21 of 32: hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341078690560, length=11778108 2012-07-01 13:24:29,809 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://namenode:9000/hbase/t_speakfor_relation_chapter/ffd2057b46da227e078c82ff43f0f9f2/recovered.edits/00660951991 (wrote 8178 edits in 403ms) 2012-07-01 13:24:29,809 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting completed in -1268935 ms for hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553 2012-07-01 13:24:29,824 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received exception accessing META during server shutdown of datanode1,60020,1325326435553, retrying META read org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2408) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1649) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) lztaomin
Re: HMASTER -- odd messages ?
Would Datanode issues impact the HMaster stability? Yes and no. If you have only a few datanodes down, their should be no issue. When there are enough missing datanodes to make some blocks not available at all in the cluster, there are many tasks that can not be done anymore (to say the least, and depending on the blocks), for the master or for the region server. In this case the ideal contract for the master would be to survive, does the tasks it can do, logs the tasks it can't do. Today, the contract for the master in such situation is more do your best but don't corrupt anything. Note that there is an autorestart option in the scripts in the planned 0.96, so the master can be asked to restart automatically if not stopped properly. N. On Tue, Jul 3, 2012 at 7:08 PM, Jay Wilson registrat...@circle-cross-jn.com wrote: My HMaster and HRegionservers start and run for awhile. Looking at the messages, there are appear to be some Datanodes with some issues, HLogSplitter has some block issues, the HMaster appears to drop off the network (i know bad), then it comes back, and then the cluster runs for about 10 more minutes before everything aborts. Questions: . Are HLogSplitter block error messages common? . Would Datanode issues impact the HMaster stability? . Other than an actual network issue is there anything that can cause a No route to host Thank you --- Jay Wilson 2012-07-03 09:04:58,266 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers finished 2012-07-03 09:04:58,273 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived processed log hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-03,60020,1341328322971-splitting/devrackA-03%3A60020.1341328323503 to hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.oldlogs/devrackA-03%3A60020.1341328323503 2012-07-03 09:04:58,275 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting completed in 1052 ms for hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-03,60020,1341328322971-splitting 2012-07-03 09:04:58,277 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1 hlog(s) in hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting 2012-07-03 09:04:58,277 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread Thread[WriterThread-0,5,main]: starting 2012-07-03 09:04:58,277 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread Thread[WriterThread-1,5,main]: starting 2012-07-03 09:04:58,278 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread Thread[WriterThread-2,5,main]: starting 2012-07-03 09:04:58,278 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of 1: hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517, length=124 2012-07-03 09:04:58,278 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering file hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517 2012-07-03 09:04:59,282 INFO org.apache.hadoop.hbase.util.FSUtils: Finished lease recover attempt for hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517 2012-07-03 09:04:59,339 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Pushed=0 entries from hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517 2012-07-03 09:04:59,341 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Waiting for split writer threads to finish 2012-07-03 09:04:59,342 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers finished 2012-07-03 09:04:59,347 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived processed log hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517 to hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.oldlogs/devrackA-04%3A60020.1341328323517 2012-07-03 09:04:59,349 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting completed in 1073 ms for hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting 2012-07-03 09:04:59,352 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1 hlog(s) in hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-05,60020,1341328322976-splitting 2012-07-03 09:04:59,352 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread Thread[WriterThread-0,5,main]: starting 2012-07-03 09:04:59,352 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread Thread[WriterThread-1,5,main]: starting 2012-07-03 09:04:59,352 INFO
Re: Stargate: ScannerModel
(moving this to the user mailing list, with the dev one in bcc) From what you said it should be customerid_MIN_TX_ID to customerid_MAX_TX_ID But only if customerid size is constant. Note that with this rowkey design there will be very few regions involved, so it's unlikely to be parallelized. N. On Thu, Jun 28, 2012 at 7:43 AM, sameer sameer_therat...@infosys.com wrote: Hello, I want to what are the parameters for scan.setStartRow ans scan.setStopRow. My requirement is that I have a table, with key as customerid_transactionId. I want to scan all the rows, they key of which contains the customer Id that I have. I tried using rowFilter but it is quite slow. If I am using the scan - setStartRow and setStopRow then what would I give as parameters? Thanks, Sameer -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Stargate-ScannerModel-tp2975161p4019139.html Sent from the HBase - Developer mailing list archive at Nabble.com.
Re: Scan vs Put vs Get
that appears on the UI? In your random scan how many Regions are scanned whereas in gets may be many due to randomness. Regards Ram -Original Message- From: N Keywal [mailto:nkey...@gmail.com] Sent: Thursday, June 28, 2012 2:00 PM To: user@hbase.apache.org Subject: Re: Scan vs Put vs Get Hi Jean-Marc, Interesting :-) Added to Anoop questions: What's the hbase version you're using? Is it repeatable, I mean if you try twice the same gets with the same client do you have the same results? I'm asking because the client caches the locations. If the locations are wrong (region moved) you will have a retry loop, and it includes a sleep. Do you have anything in the logs? Could you share as well the code you're using to get the ~100 ms time? Cheers, N. On Thu, Jun 28, 2012 at 6:56 AM, Anoop Sam John anoo...@huawei.com wrote: Hi How many Gets you batch together in one call? Is this equal to the Scan#setCaching () that u are using? If both are same u can be sure that the the number of NW calls is coming almost same. Also you are giving random keys in the Gets. The scan will be always sequential. Seems in your get scenario it is very very random reads resulting in too many reads of HFile block from HDFS. [Block caching is enabled?] Also have you tried using Bloom filters? ROW blooms might improve your get performance. -Anoop- From: Jean-Marc Spaggiari [jean-m...@spaggiari.org] Sent: Thursday, June 28, 2012 5:04 AM To: user Subject: Scan vs Put vs Get Hi, I have a small piece of code, for testing, which is putting 1B lines in an existing table, getting 3000 lines and scanning 1. The table is one family, one column. Everything is done randomly. Put with Random key (24 bytes), fixed family and fixed column names with random content (24 bytes). Get (batch) is done with random keys and scan with RandomRowFilter. And here are the results. Time to insert 100 lines: 43 seconds (23255 lines/seconds) That's correct for my needs based on the poor performances of the servers in the cluster. I'm fine with the results. Time to read 3000 lines: 11444.0 mseconds (262 lines/seconds) This is way to low. I don't understand why. So I tried the random scan because I'm not able to figure the issue. Time to read 1 lines: 108.0 mseconds (92593 lines/seconds) This it impressive! I have added that after I failed with the get. I moved from 262 lines per seconds to almost 100K lines/seconds!!! It's awesome! However, I'm still wondering what's wrong with my gets. The code is very simple. I'm using Get objects that I'm executing in a Batch. I tried to add a filter but it's not helping. Here is an extract of the code. for (long l = 0; l linesToRead; l++) { byte[] array1 = new byte[24]; for (int i = 0; i array1.length; i++) array1[i] = (byte)Math.floor(Math.random() * 256); Get g = new Get (array1); gets.addElement(g); } Object[] results = new Object[gets.size()]; System.out.println(new java.util.Date () + \gets\ created.); long timeBefore = System.currentTimeMillis(); table.batch(gets, results); long timeAfter = System.currentTimeMillis(); float duration = timeAfter - timeBefore; System.out.println (Time to read + gets.size() + lines : + duration + mseconds ( + Math.round(((float)linesToRead / (duration / 1000))) + lines/seconds)); What's wrong with it? I can't add the setBatch neither I can add setCaching because it's not a scan. I tried with different numbers of gets but it's almost always the same speed. Am I using it the wrong way? Does anyone have any advice to improve that? Thanks, JM
Re: Scan vs Put vs Get
Thank you. It's clearer now. From the code you sent, RandomRowFilter is not used. You're only using the KeyOnlyFilter (the second setFilter replaces the first one; you need to use like FilterList to combine filters). (Note as well that you would need to initialize RandomRowFilter#chance, if not all the rows will be filtered out.) So, in one case -list of gets-, you're reading a well defined set of rows (defined randomly, but well defined :-), and this set spreads all other the regions. In the second one (KeyOnlyFilter), you're reading the first 1K rows you could get from the cluster. This explains the difference between the results. Activating RandomRowFilter should not change much the results, as it's different to select a random set of rows and to get a set of rows defined randomly (don't know if I'm clear here...). Unfortunately you're likely to be more interested of the performance when there is a real selection. Your code for list of gets was correct imho. I'm interested by the results if you activate bloomfilters. Cheers, N. On Thu, Jun 28, 2012 at 3:45 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi N Keywal, This result: Time to read 1 lines : 122.0 mseconds (81967 lines/seconds) Is obtain with this code: HTable table = new HTable(config, test3); final int linesToRead = 1; System.out.println(new java.util.Date () + Processing iteration + iteration + ... ); RandomRowFilter rrf = new RandomRowFilter(); KeyOnlyFilter kof = new KeyOnlyFilter(); Scan scan = new Scan(); scan.setFilter(rrf); scan.setFilter(kof); scan.setBatch(Math.min(linesToRead, 1000)); scan.setCaching(Math.min(linesToRead, 1000)); ResultScanner scanner = table.getScanner(scan); processed = 0; long timeBefore = System.currentTimeMillis(); for (Result result : scanner.next(linesToRead)) { if (result != null) processed++; } scanner.close(); long timeAfter = System.currentTimeMillis(); float duration = timeAfter - timeBefore; System.out.println (Time to read + linesToRead + lines : + duration + mseconds ( + Math.round(((float)linesToRead / (duration / 1000))) + lines/seconds)); table.close (); This is with the scan. scan 80 000 lines/seconds put 20 000 lines/seconds get 300 lines/seconds 2012/6/28, Jean-Marc Spaggiari jean-m...@spaggiari.org: Hi Anoop, Are Bloom filters for columns? If I add g.setFilter(new KeyOnlyFilter()); that mean I can't use bloom filters, right? Basically, what I'm doing here is something like existKey(byte[]):boolean where I try to see if a key exist in the database whitout taking into consideration if there is any column content or not. This should be very fast. Even faster than the scan which need to keep some tracks of where I'm reading for the next row. JM 2012/6/28, Anoop Sam John anoo...@huawei.com: blockCacheHitRatio=69% Seems blocks you are getting from cache. You can check with Blooms also once. You can enable the usage of bloom using the config param io.storefile.bloom.enabled set to true . This will enable the usage of bloom globally Now you need to set the bloom type for your CF HColumnDescriptor#setBloomFilterType() U can check with type BloomType.ROW -Anoop- _ From: Jean-Marc Spaggiari [jean-m...@spaggiari.org] Sent: Thursday, June 28, 2012 5:42 PM To: user@hbase.apache.org Subject: Re: Scan vs Put vs Get Oh! I never looked at this part ;) Ok. I have it. Here are the numbers for one server before the read: blockCacheSizeMB=186.28 blockCacheFreeMB=55.4 blockCacheCount=2923 blockCacheHitCount=195999 blockCacheMissCount=89297 blockCacheEvictedCount=69858 blockCacheHitRatio=68% blockCacheHitCachingRatio=72% And here are the numbers after 100 iterations of 1000 gets for the same server: blockCacheSizeMB=194.44 blockCacheFreeMB=47.25 blockCacheCount=3052 blockCacheHitCount=232034 blockCacheMissCount=103250 blockCacheEvictedCount=83682 blockCacheHitRatio=69% blockCacheHitCachingRatio=72% Don't forget that there is between 40B and 50B of lines in the table, so I don't think the servers can store all of them in memory. And since I'm accessing based on a random key, odds to have the right row in memory are small I think. JM 2012/6/28, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com: In 0.94 The UI of the RS has a metrics table. In that you can see blockCacheHitCount, blockCacheMissCount etc. May be there is a variation when you do scan() and get() here. Regards Ram -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Thursday, June 28, 2012 4:44 PM To: user@hbase.apache.org Subject: Re: Scan vs Put vs Get Wow. First, thanks a lot all for jumping into this. Let me try to reply to everyone in a single post. How many Gets you batch together in one call I tried with multiple different values from 10 to 3000 with similar results. Time to read
Re: Scan vs Put vs Get
For the filter list my guess is that you're filtering out all rows because RandomRowFilter#chance is not initialized (it should be something like RandomRowFilter rrf = new RandomRowFilter(0.5);) But note that this test will never be comparable to the test with a list of gets. You can make it as slow/fast as you want by playing with the 'chance' parameter. The results with gets and bloom filter are also in the interesting category, hopefully an expert will get in the loop... On Thu, Jun 28, 2012 at 6:04 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I mean, bad I did not figured that. Thanks for pointing that. That definitively explain the difference in the performances. I have activated the bloomfilters with this code: HBaseAdmin admin = new HBaseAdmin(config); HTable table = new HTable(config, test3); System.out.println (table.getTableDescriptor().getColumnFamilies()[0]); HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0]; cd.setBloomFilterType(BloomType.ROW); admin.disableTable(test3); admin.modifyColumn(test3, cd); admin.enableTable(test3); System.out.println (table.getTableDescriptor().getColumnFamilies()[0]); And here is the result for the first attempt (using gets): {NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'} {NAME = 'cf', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'} Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0... Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds) 2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds) 3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds) After few more iterations (about 30), I'm between 200 and 250 lines/seconds, like before. Regarding the filterList, I tried, but now I'm getting this error from the servers: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-6376193724680783311' does not exist Here is the code: final int linesToRead = 1; System.out.println(new java.util.Date () + Processing iteration + iteration + ... ); RandomRowFilter rrf = new RandomRowFilter(); KeyOnlyFilter kof = new KeyOnlyFilter(); Scan scan = new Scan(); ListFilter filters = new ArrayListFilter(); filters.add(rrf); filters.add(kof); FilterList filterList = new FilterList(filters); scan.setFilter(filterList); scan.setBatch(Math.min(linesToRead, 1000)); scan.setCaching(Math.min(linesToRead, 1000)); ResultScanner scanner = table.getScanner(scan); processed = 0; long timeBefore = System.currentTimeMillis(); for (Result result : scanner.next(linesToRead)) { System.out.println(Result: + result); // if (result != null) processed++; } scanner.close(); It's failing when I try to do for (Result result : scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and 1 with the same result :( I will try to find the root cause, but if you have any hint, it's welcome. JM
Re: HBase first steps: Design a table
Hi, Usually I'm inserting about 40 000 rows at a time. Should I do 40 000 calls to put? Or is there any bulkinsert method? There is this chapter on bulk loading: http://hbase.apache.org/book.html#arch.bulk.load But for 40K rows you may just want to use void put(final ListPut puts) in HTableInterface, that will save a lot of rpc calls. Cheers, N.
Re: Region is not online Execptions
Hi, You can have this if the region moved, i.e. was previously managed by this region server and is now managed by another. The client keeps a cache of the locations, so after a move it will first contact the wrong server. Then the client will update its cache. By default there are 10 internal retries so the next retry will be the right one, and this error should not be seen in the customer code. In 0.96 the regions server will send back a RegionMovedException with the new location if the move is not too old (less than around ~5 minutes if I remember well). N. On Thu, Jun 7, 2012 at 9:36 PM, arun sirimalla arunsi...@gmail.com wrote: Hi, My Hbase cluster seems to work fine, but i see some exepctions in one of the RegionServer with below message 2012-06-07 19:24:48,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: -ROOT-,,0 2012-06-07 19:24:56,154 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: -ROOT-,,0 Though this regionserver is not Hosting the ROOT region. The -ROOT- region is hosted by another Regionserver. Can someone please tell me why these exceptions occur. Thanks Arun
Re: hosts unreachables
Yes, this is the balance process (as its name says: keeps the cluster balanced), and it's not related to the process of looking after dead nodes. The nodes are monitored by ZooKeeper, the timeout is by default 180 seconds (setting: zookeeper.session.timeout) On Fri, Jun 1, 2012 at 4:40 PM, Cyril Scetbon cyril.scet...@free.fr wrote: I've another regionserver (hb-d2) that crashed (I can easily reproduce the issue by continuing injections), and as I see in master log, it gets information about hb-d2 every 5 minutes. I suppose it's what helps him to note if a node is dead or not. However it adds hb-d2 to the dead node list at 13:32:20, so before 5 minutes since the last time it got the server information. Is it normal ? 2012-06-01 13:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47 .. 2012-06-01 13:07:36,319 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47 .. 2012-06-01 13:12:36,328 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47 .. 2012-06-01 13:17:36,337 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47 .. 2012-06-01 13:22:36,346 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47 .. 2012-06-01 13:27:36,353 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server information: hb-d5,60020,1338553124247=47, hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46, hb-d10,60020,1338553126695=47, hb-d6,60020,133 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47, hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47, hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47 .. 2012-06-01 13:32:20,048 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [hb-d2,60020,1338553126560] 2012-06-01 13:32:20,048 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=hb-d2,60020,1338553126560 to dead servers, submitted shutdown handler to be executed, root=false, meta=false 2012-06-01 13:32:20,048 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for hb-d2,60020,1338553126560 On 6/1/12 3:25 PM, Cyril Scetbon wrote: I've added hbase.hregion.memstore.mslab.enabled = true to the configuration of all regionservers and add flags -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=60 to the hbase environment However my regionservers are still crashing when I load data into the cluster Here are the logs for the node hb-d3 that crashed at 12:56 - GC logs : http://pastebin.com/T0d0y8pZ - regionserver logs : http://pastebin.com/n6v9x3XM thanks On 5/31/12 11:12 PM, Jean-Daniel Cryans wrote: Both, also you could bigger log snippets (post them on something like pastebin.com) and we could see more evidence of the issue. J-D On Thu, May 31, 2012 at 2:09 PM, Cyril Scetboncyril.scet...@free.fr wrote: On 5/31/12 11:00 PM, Jean-Daniel Cryans wrote: What I'm seeing looks more like GC issues. Start reading this: http://hbase.apache.org/book.html#gc J-D Hi, Really not sure cause I've enabled gcc's verbose option and I don't see anything taking a long time. Maybe I can check again on one node. On which node do
Re: Null rowkey with empty get operation
There is a one to one mapping between the result and the get arrays; so the result for rowkeys[i] is in results[i]. That's not what you want? On Tue, May 29, 2012 at 9:34 AM, Ben Kim benkimkim...@gmail.com wrote: Maybe I showed you a bad example. This makes more sense when it comes to using ListGet For instance, ListGet gets = new ArrayList(); for(String rowkey : rowkeys){ Get get = new Get(Bytes.toBytes(rowkey)); get.addFamily(family); Filter filter = new QualifierFilter(CompareOp.NOT_EQUAL, new BinaryComparator(item)); get.setFilter(filter); gets.add(get); } Result[] results = table.get(get); Now I have multiple results, I need to find the rowkey of the result that has no keyvalue. but results[0].getRow() is null if results[0] has no keyvalue. so it's hard to derive which row the empty result belongs to :( Thank you for your response, Ben On Tue, May 29, 2012 at 2:33 PM, Anoop Sam John anoo...@huawei.com wrote: Hi Ben, In HBase rowkey exists with KVs only. As in your case there is no KVs in the result, and so no rowkey. What is the use case that you are referring here? When you issued Get with a rowkey and empty result for that , you know the rowkey already right? I mean any specific reason why you try to find the rowkey from the result object? -Anoop- From: Ben Kim [benkimkim...@gmail.com] Sent: Tuesday, May 29, 2012 6:42 AM To: user@hbase.apache.org Subject: Null rowkey with empty get operation I have following Get code with HBase 0.92.0 Get get = new Get(Bytes.toBytes(rowkey)); get.addFamily(family); Filter filter = new QualifierFilter(CompareOp.NOT_EQUAL, new BinaryComparator(item)); get.setFilter(filter); Result r = table.get(get); System.out.println(r); // (1) prints keyvalues=NONE System.out.println(Bytes.toString(r.getRow())); // (2) throws NullpointerException printing out the result shows that all columns in a row was filtered out. but i still want to print out the row key of the empty result. But the value of r.getRow() is null Shouldn't r.getRow() return the rowkey even if the keyvalues are emtpy? -- *Benjamin Kim** benkimkimben at gmail* -- *Benjamin Kim* **Mo : +82 10.5357.0521* benkimkimben at gmail*
Re: Issues with Java sample for connecting to remote Hbase
From http://hbase.apache.org/book/os.html: HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions, for example, will default to 127.0.1.1 and this will cause problems for you. It worths reading the whole section ;-). You also don't need to set the master address: it will be read from zookeeper. I.e. you can remove this line from your client code: config.set(hbase.master, 10.78.32.131:60010); N. On Tue, May 29, 2012 at 3:46 PM, AnandaVelMurugan Chandra Mohan ananthu2...@gmail.com wrote: Thanks for the response. It still errors out. On Tue, May 29, 2012 at 7:05 PM, Mohammad Tariq donta...@gmail.com wrote: change the name from localhost to something else in the line 10.78.32.131 honeywel-4a7632 localhost and see if it works Regards, Mohammad Tariq On Tue, May 29, 2012 at 6:59 PM, AnandaVelMurugan Chandra Mohan ananthu2...@gmail.com wrote: I have HBase version 0.92.1 running in standalone mode. I created a table and added few rows using hbase shell. Now I am developing a standalone java application to connect to Hbase and retrieve the data from the table. * This is the code I am using * Configuration config = HBaseConfiguration.create(); config.clear(); config.set(hbase.zookeeper.quorum, 10.78.32.131); config.set(hbase.zookeeper.property.clientPort,2181); config.set(hbase.master, 10.78.32.131:60010); HBaseAdmin.checkHBaseAvailable(config); // This instantiates an HTable object that connects you to the myTable // table. HTable table = new HTable(config, asset); Get g = new Get(Bytes.toBytes(APU 331-350)); Result r = table.get(g); *This is the content of my /etc/hosts file* #127.0.0.1 localhost.localdomain localhost #10.78.32.131 honeywel-4a7632 #127.0.1.1 honeywel-4a7632 ::1 honeywel-4a7632 localhost6.localdomain6 localhost6 10.78.32.131 honeywel-4a7632 localhost * This is part of my error stack trace* 12/05/29 18:53:33 INFO client.HConnectionManager$HConnectionImplementation: getMaster attempt 0 of 1 failed; no more retrying. java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy5.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:106) at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1553) at hbaseMain.main(hbaseMain.java:27) 12/05/29 18:53:33 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13798c3ce190003 12/05/29 18:53:33 INFO zookeeper.ZooKeeper: Session: 0x13798c3ce190003 closed 12/05/29 18:53:33 INFO zookeeper.ClientCnxn: EventThread shut down Can some one help me fix this? Thanks a lot. -- Regards, Anand -- Regards, Anand
Re: understanding the client code
Hi, If you're speaking about preparing the query it's in HTable and HConnectionManager. If you're on the pure network level, then, on trunk, it's now done with a third party called protobuf. See the code from HConnectionManager#createCallable to see how it's used. Cheers, N. On Tue, May 29, 2012 at 4:15 PM, S Ahmed sahmed1...@gmail.com wrote: I'm looking at the client code here: https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client Is this the high level operations, and the actual sending of this data over the network is done somewhere else? For example, during a PUT, you may want it to write to n nodes, where is the code that does that? And the actual network connection etc?
Re: understanding the client code
There are two levels: - communication between hbase client and hbase cluster: this is the code you have in hbase client package. As a end user you don't really care, but you care if you want to learn hbase internals. - communication between customer code and hbase as a whole if you don't want to use the hbase client. Then several options are available, thrift being one of them (I'm not sure of avro status). What do you want to do exactly? On Tue, May 29, 2012 at 4:33 PM, S Ahmed sahmed1...@gmail.com wrote: So how does thrift and avro fit into the picture? (I believe I saw references to that somewhere, are those alternate connection libs?) I know protobuf is just generating types for various languages... On Tue, May 29, 2012 at 10:26 AM, N Keywal nkey...@gmail.com wrote: Hi, If you're speaking about preparing the query it's in HTable and HConnectionManager. If you're on the pure network level, then, on trunk, it's now done with a third party called protobuf. See the code from HConnectionManager#createCallable to see how it's used. Cheers, N. On Tue, May 29, 2012 at 4:15 PM, S Ahmed sahmed1...@gmail.com wrote: I'm looking at the client code here: https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client Is this the high level operations, and the actual sending of this data over the network is done somewhere else? For example, during a PUT, you may want it to write to n nodes, where is the code that does that? And the actual network connection etc?
Re: understanding the client code
So it's the right place for the internals :-). The main use case for the thrift api is when you have non java client code. On Tue, May 29, 2012 at 5:07 PM, S Ahmed sahmed1...@gmail.com wrote: I don't really want any, I just want to learn the internals :) So why would someone not want to use the client, for data intensive tasks like mapreduce etc. where they want direct access to the files? On Tue, May 29, 2012 at 11:00 AM, N Keywal nkey...@gmail.com wrote: There are two levels: - communication between hbase client and hbase cluster: this is the code you have in hbase client package. As a end user you don't really care, but you care if you want to learn hbase internals. - communication between customer code and hbase as a whole if you don't want to use the hbase client. Then several options are available, thrift being one of them (I'm not sure of avro status). What do you want to do exactly? On Tue, May 29, 2012 at 4:33 PM, S Ahmed sahmed1...@gmail.com wrote: So how does thrift and avro fit into the picture? (I believe I saw references to that somewhere, are those alternate connection libs?) I know protobuf is just generating types for various languages... On Tue, May 29, 2012 at 10:26 AM, N Keywal nkey...@gmail.com wrote: Hi, If you're speaking about preparing the query it's in HTable and HConnectionManager. If you're on the pure network level, then, on trunk, it's now done with a third party called protobuf. See the code from HConnectionManager#createCallable to see how it's used. Cheers, N. On Tue, May 29, 2012 at 4:15 PM, S Ahmed sahmed1...@gmail.com wrote: I'm looking at the client code here: https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client Is this the high level operations, and the actual sending of this data over the network is done somewhere else? For example, during a PUT, you may want it to write to n nodes, where is the code that does that? And the actual network connection etc?
Re: HBase (BigTable) many to many with students and courses
Hi, For the multiget, if it's small enough, it will be: - parallelized on all region servers concerned. i.e. you will be as fast as the slowest region server. - there will be one query per region server (i.e. gets are grouped by region server). If there are too many gets, it will be split in small subsets and the strategy above will be used for each subset, doing one subset after another (and blocking between them). so Large set -- Small set will be ok from this point of view. Large -- Large won't. N. On Tue, May 29, 2012 at 5:54 PM, Em mailformailingli...@yahoo.de wrote: Ian, thanks for your detailed response! Let me give you feedback to each point: 1. You could denormalize the additional information (e.g. course name) into the students table. Then, you're simply reading the student row, and all the info you need is there. That places an extra burden of write time and disk space, and does make you do a lot more work when a course name changes. That's exactly what I thought about and that's why I avoid it. The students and courses example is an example you find at several points on the web, when describing the differences and translations of relations from an RDBMS into a Key-Value-store. In fact, everything you model with a Key-Value-storage like HBase, Cassandra etc. can be modeled as an RDMBS-scheme. Since a lot of people, like me, are coming from that edge, we must re-learn several basic things. It starts with understanding that you model a K-V-storage the way you want to access the data, not as the data relates to eachother (in general terms) and ends with translating the connections of data into a K-V-schema as good as possible. 2. You could do what you're talking about in your HBase access code: find the list of course IDs you need for the student, and do a multi get on the course table. Fundamentally, this won't be much more efficient to do in batch mode, because the courses are likely to be evenly spread out over the region servers (orthogonal to the students). You're essentially doing a hash join, except that it's a lot less pleasant than on a relational DB b/c you've got network round trips for each GET. The disk blocks from the course table (I'm assuming it's the smaller side) will likely be cached so at least that part will be fast--you'll be answering those questions from memory, not via disk IO. Whow, what? I thought a Multiget would reduce network-roundtrips as it only accesses each region *one* time, fetching all the queried keys and values from there. If your data is randomly distributed, this could result in the same costs as with doing several Gets in a loop, but should work better if several Keys are part of the same region. Am I right or did I missunderstood the concept??? 3. You could also let a higher client layer worry about this. For example, your data layer query just returns a student with a list of their course IDs, and then another process in your client code looks up each course by ID to get the name. You can then put an external caching layer (like memcached) in the middle and make things a lot faster (though that does put the burden on you to have the code path for changing course info also flush the relevant cache entries). In your example, it's unlikely any institution would have more than a few thousand courses, so they'd probably all stay in memory and be served instantaneously. Hm, in what way does this give me an advantage over using HBase - assuming that the number of courses is small enough to fit in RAM - ? I know that Memcached is optimized for this purpose and might have much faster response times - no doubts. However, from a conceptual point of view: Why does Memcached handles the K-V-distribution more efficiently than a HBase with warmed caches? Hopefully this question isn't that hard :). This might seem laborious, and to a degree it is. But note that it's difficult to see the utility of HBase with toy examples like this; if you're really storing courses and students, don't use HBase (unless you've got billions of students and courses, which seems unlikely). The extra thought you have to put in to making schemas work for you in HBase is only worth it when it gives you the ability to scale to gigantic data sets where other solutions wouldn't. Well, the background is a private project. I know that it's a lot easier to do what I want in a RDBMS and there is no real need for using a highly scalable beast like HBase. However, I want to learn something new and since I do not break someone's business by trying out new technology privately, I want to go with HStack. Without ever doing it, you never get a real feeling of when to use the right tool. Using a good tool for the wrong problem can be an interesting experience, since you learn some of the do's and don'ts of the software you use. Since I am a reader of the MEAP-edition of HBase in Action, I am aware of the
Re: batch insert performance
Hi, What version are you using? On trunk, put(Put) and put(ListPut) calls the same code, so I would expect comparable performances when autoflush it set to false. However, with 250K small puts you may have the gc playing a role. What are the results if you do the inserts with 50 times 5K rows? N. On Sun, May 27, 2012 at 1:58 AM, Faruk Berksöz fberk...@gmail.com wrote: Codes and their results : Code Nmr List List Size Code Avarage Elapsed Time (sec) 1 ListPut batchAllRows; 250.000 table.setAutoFlush(false); for (Put mRow : batchAllRows) { table.put(mRow); } table.flushCommits(); 27 2 ListPut batchAllRows; 250.000 table.setAutoFlush(false); table.put(batchAllRows); table.flushCommits(); 103 3 ListRow batchAllRows; 250.000 table.setAutoFlush(false); Object[] results = new Object[batchAllRows.size()]; table.batch(batchAllRows, results); //table.batch(batchAllRows) ; /* already tried */ table.flushCommits(); 105 -- Forwarded message -- From: Faruk Berksöz fberk...@gmail.com Date: 2012/5/27 Subject: batch insert performance To: user@hbase.apache.org Hi, HBase users, I have 250.000 Rows in a list. I want to insert all rows in HTable as soon as possible. I have 3 different Code and 3 different elapsed time. Why HTable.batch(List? extends Row actions, Object[] results) and HTable.put(ListPut puts) methods 4 times slower than 1.Code which inserts records to htable in a simple loop ? Codes and their results : Faruk
Re: Important Undefined Error
Hi, There could be multiple issues, but it's strange to have in hbase-site.xml valuehdfs://namenode:9000/hbase/value while the core-site.xml says: valuehdfs://namenode:54310//value The two entries should match. I would recommend to: - use netstat to check the ports (netstat -l) - do the check recommended by Harsh J previously. N. On Mon, May 14, 2012 at 3:21 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: pleas hel From: dalia.mohso...@hotmail.com To: user@hbase.apache.org Subject: RE: Important Undefined Error Date: Mon, 14 May 2012 12:20:18 +0200 Hi, I tried what you told me, but nothing worked:((( First when I run this command:dalia@namenode:~$ host -v -t A `hostname`Output:Trying namenodeHost namenode not found: 3(NXDOMAIN)Received 101 bytes from 10.0.2.1#53 in 13 ms My core-site.xml:configurationproperty namefs.default.name/name !--valuehdfs://namenode:8020/value-- valuehdfs://namenode:54310//value/property/configuration My hdfs-site.xmlconfigurationpropertynamedfs.name.dir/namevalue/data/1/dfs/nn,/nfsmount/dfs/nn/value/property!--propertynamedfs.data.dir/namevalue/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn/value/property--propertynamedfs.datanode.max.xcievers/namevalue4096/value/propertypropertynamedfs.replication/namevalue3/value/propertyproperty namedfs.permissions.superusergroup/name valuehadoop/value/property My Mapred-site.xmlconfigurationnamemapred.local.dir/namevalue/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local/value/configuration My Hbase-site.xmlconfigurationpropertynamehbase.cluster.distributed/name valuetrue/value/propertyproperty namehbase.rootdir/name valuehdfs://namenode:9000/hbase/value/propertypropertynamehbase.zookeeper.quorun/name valuenamenode/value/propertypropertynamehbase.regionserver.port/namevalue60020/valuedescriptionThe host and port that the HBase master runs at./description/propertypropertynamedfs.replication/namevalue1/value/propertypropertynamehbase.zookeeper.property.clientPort/namevalue2181/valuedescriptionProperty from ZooKeeper's config zoo.cfg.The port at which the clients will connect./description/property/configuration Please Help I am really disappointed I have been through all that for two weeks From: dwivedishash...@gmail.com To: user@hbase.apache.org Subject: RE: Important Undefined Error Date: Sat, 12 May 2012 23:31:49 +0530 The problem is your hbase is not able to connect to Hadoop, can you put your hbase-site.xml content here.. have you specified localhost somewhere, if so remove localhost from everywhere and put your hdfsl namenode address suppose your namenode is running on master:9000 then put your hbase file system setting as master:9000/hbase here I am sending you the configuration which I am using in hbase and is working My hbase-site.xml content is ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- /** * Copyright 2010 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ -- configuration property namehbase.rootdir/name valuehdfs://master:9000/hbase/value /property property namehbase.master/name valuemaster:6/value descriptionThe host and port that the HBase master runs at./description /property property namehbase.regionserver.port/name value60020/value descriptionThe host and port that the HBase master runs at./description /property !--property namehbase.master.port/name value6/value descriptionThe host and port that the HBase master runs at./description /property-- property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.tmp.dir/name value/home/shashwat/Hadoop/hbase-0.90.4/temp/value /property property namehbase.zookeeper.quorum/name valuemaster/value /property property namedfs.replication/name value1/value /property property namehbase.zookeeper.property.clientPort/name value2181/value descriptionProperty from ZooKeeper's config zoo.cfg. The port at which the clients will connect. /description /property
Re: Important Undefined Error
In core-file.xml, do you have this? configuration property namefs.default.name/name valuehdfs://namenode:8020/hbase/value /property If you want hbase to connect to 8020 you must have hdfs listening on 8020 as well. On Mon, May 14, 2012 at 5:17 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: H I have tried to make both ports the same. But the prob is the hbase cannot connect to port 8020. When i run nmap hostname, port 8020 wasnt with the list of open ports. I have tried what harsh told me abt. I used the same port he used but same error occurred. Another aspect in cloudera doc it says that i have to canonical name for the host ex: namenode.example.com as the hostname, but i didnt find it in any tutorial. No one makes it. Note that i am deploying my cluster in fully distributed mode i.e am using 4 machines.. So any ideas??!! Sent from my iPhone On 2012-05-14, at 4:07 PM, N Keywal nkey...@gmail.com wrote: Hi, There could be multiple issues, but it's strange to have in hbase-site.xml valuehdfs://namenode:9000/hbase/value while the core-site.xml says: valuehdfs://namenode:54310//value The two entries should match. I would recommend to: - use netstat to check the ports (netstat -l) - do the check recommended by Harsh J previously. N. On Mon, May 14, 2012 at 3:21 PM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: pleas hel From: dalia.mohso...@hotmail.com To: user@hbase.apache.org Subject: RE: Important Undefined Error Date: Mon, 14 May 2012 12:20:18 +0200 Hi, I tried what you told me, but nothing worked:((( First when I run this command:dalia@namenode:~$ host -v -t A `hostname`Output:Trying namenodeHost namenode not found: 3(NXDOMAIN)Received 101 bytes from 10.0.2.1#53 in 13 ms My core-site.xml:configurationproperty namefs.default.name/name !--valuehdfs://namenode:8020/value-- valuehdfs://namenode:54310//value/property/configuration My hdfs-site.xmlconfigurationpropertynamedfs.name.dir/namevalue/data/1/dfs/nn,/nfsmount/dfs/nn/value/property!--propertynamedfs.data.dir/namevalue/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn/value/property--propertynamedfs.datanode.max.xcievers/namevalue4096/value/propertypropertynamedfs.replication/namevalue3/value/propertyproperty namedfs.permissions.superusergroup/name valuehadoop/value/property My Mapred-site.xmlconfigurationnamemapred.local.dir/namevalue/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local/value/configuration My Hbase-site.xmlconfigurationpropertynamehbase.cluster.distributed/name valuetrue/value/propertyproperty namehbase.rootdir/name valuehdfs://namenode:9000/hbase/value/propertypropertynamehbase.zookeeper.quorun/name valuenamenode/value/propertypropertynamehbase.regionserver.port/namevalue60020/valuedescriptionThe host and port that the HBase master runs at./description/propertypropertynamedfs.replication/namevalue1/value/propertypropertynamehbase.zookeeper.property.clientPort/namevalue2181/valuedescriptionProperty from ZooKeeper's config zoo.cfg.The port at which the clients will connect./description/property/configuration Please Help I am really disappointed I have been through all that for two weeks From: dwivedishash...@gmail.com To: user@hbase.apache.org Subject: RE: Important Undefined Error Date: Sat, 12 May 2012 23:31:49 +0530 The problem is your hbase is not able to connect to Hadoop, can you put your hbase-site.xml content here.. have you specified localhost somewhere, if so remove localhost from everywhere and put your hdfsl namenode address suppose your namenode is running on master:9000 then put your hbase file system setting as master:9000/hbase here I am sending you the configuration which I am using in hbase and is working My hbase-site.xml content is ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- /** * Copyright 2010 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ -- configuration property namehbase.rootdir/name valuehdfs://master:9000/hbase/value /property property namehbase.master/name valuemaster
Re: RegionServer silently stops (only issue: CMS-concurrent-mark ~80sec)
Hi Alex, On the same idea, note that hbase is launched with -XX:OnOutOfMemoryError=kill -9 %p. N. On Tue, May 1, 2012 at 10:41 AM, Igal Shilman ig...@wix.com wrote: Hi Alex, just to rule out, oom killer, Try this: http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer On Mon, Apr 30, 2012 at 10:48 PM, Alex Baranau alex.barano...@gmail.com wrote: Hello, During recent weeks I constantly see some RSs *silently* dying on our HBase cluster. By silently I mean that process stops, but no errors in logs [1]. The only thing I can relate to it is long CMS-concurrent-mark: almost 80 seconds. But this should not cause issues as it is not a stop-the-world process. Any advice? HBase: hbase-0.90.4-cdh3u3 Hadoop: 0.20.2-cdh3u3 Thank you, Alex Baranau [1] last lines from RS log (no errors before too, and nothing written in *.out file): 2012-04-30 18:52:11,806 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for agg-sa-1.3,0011| te|dtc|\x00\x00\x00\x00\x00\x00\x1E\x002\x00\x00\x00\x015\x9C_n\x00\x00\x00\x00\x00\x00\x00\x00\x00,1334852280902.4285f9339b520ee617c087c0fd0dbf65. because regionserver60020.cacheFlusher; priority=-1, compaction queue size=0 2012-04-30 18:54:58,779 DEBUG org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using new createWriter -- HADOOP-6840 2012-04-30 18:54:58,779 DEBUG org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Path=hdfs://xxx.ec2.internal/hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651, syncFs=true, hflush=false 2012-04-30 18:54:58,874 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335811856672, entries=73789, filesize=63773934. New hlog /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651 2012-04-30 18:56:31,867 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up with memory above low water. 2012-04-30 18:56:31,867 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region agg-sa-1.3,s_00I4| tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805. due to global heap pressure 2012-04-30 18:56:31,867 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for agg-sa-1.3,s_00I4| tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805., current region memstore size 138.1m 2012-04-30 18:56:31,867 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores 2012-04-30 18:56:56,303 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=322.84 MB, free=476.34 MB, max=799.17 MB, blocks=5024, accesses=12189396, hits=127592, hitRatio=1.04%%, cachingAccesses=132480, cachingHits=126949, cachingHitsRatio=95.82%%, evictions=0, evicted=0, evictedPerRun=NaN 2012-04-30 18:56:59,026 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/.tmp/391890051647401997 to hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168 2012-04-30 18:56:59,034 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168, entries=476418, sequenceid=880198761, memsize=138.1m, filesize=5.7m 2012-04-30 18:56:59,097 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~138.1m for region agg-sa-1.3,s_00I4| tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805. in 27230ms, sequenceid=880198761, compaction requested=false ~ [2] last lines from GC log: 2012-04-30T18:58:46.683+: 105717.791: [GC 105717.791: [ParNew: 35638K-1118K(38336K), 0.0548970 secs] 3145651K-3111412K(4091776K) icms_dc=6 , 0.0550360 secs] [Times: user=0.08 sys=0.00, real=0.09 secs] 2012-04-30T18:58:46.961+: 105718.069: [GC 105718.069: [ParNew: 35230K-2224K(38336K), 0.0802440 secs] 3145524K-3112533K(4091776K) icms_dc=6 , 0.0803810 secs] [Times: user=0.06 sys=0.00, real=0.13 secs] 2012-04-30T18:58:47.114+: 105718.222: [CMS-concurrent-mark: 8.770/80.230 secs] [Times: user=61.34 sys=5.69, real=80.23 secs]
Re: HBaseAdmin needs a close methord
Hi, fwiw, the close method was added in HBaseAdmin for HBase 0.90.5. N. On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee softse@gmail.com wrote: I don't think this issue can resovle the problem ZKWatcher is removed,but the configuration and HConnectionImplementation objects are still in HConnectionManager this may still cause memery leak but calling HConnectionManager.**deleteConnection may resolve HBASE-5073 problem. I can see if (this.zooKeeper != null) { LOG.info(Closed zookeeper sessionid=0x + Long.toHexString(this.**zooKeeper.getZooKeeper().** getSessionId())); this.zooKeeper.close(); this.zooKeeper = null; } in HConnectionImplementation.**close which is called by HConnectionManager.**deleteConnection Hi Lee Is HBASE-5073 resolved in that release? Regards Ram -Original Message- From: Eason Lee [mailto:softse@gmail.com] Sent: Thursday, April 19, 2012 10:40 AM To: user@hbase.apache.org Subject: Re: HBaseAdmin needs a close methord I am using cloudera's cdh3u3 Hi Lee Which version of HBase are you using? Regards Ram -Original Message- From: Eason Lee [mailto:softse@gmail.com] Sent: Thursday, April 19, 2012 9:36 AM To: user@hbase.apache.org Subject: HBaseAdmin needs a close methord Resently, my app meets a problem list as follows Can't construct instance of class org/apache/hadoop/hbase/**client/HBaseAdmin Exception in thread Thread-2 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.**java:640) at org.apache.zookeeper.**ClientCnxn.start(ClientCnxn.**java:414) at org.apache.zookeeper.**ZooKeeper.init(ZooKeeper.**java:378) at org.apache.hadoop.hbase.**zookeeper.ZKUtil.connect(** ZKUtil.java:97) at org.apache.hadoop.hbase.**zookeeper.ZooKeeperWatcher.** init(ZooKeeperWatc her.java:119) at org.apache.hadoop.hbase.**client.HConnectionManager$** HConnectionImplementa tion.getZooKeeperWatcher(**HConnectionManager.java:1002) at org.apache.hadoop.hbase.**client.HConnectionManager$** HConnectionImplementa tion.setupZookeeperTrackers(**HConnectionManager.java:304) at org.apache.hadoop.hbase.**client.HConnectionManager$** HConnectionImplementa tion.init(**HConnectionManager.java:295) at org.apache.hadoop.hbase.**client.HConnectionManager.** getConnection(HConnec tionManager.java:157) at org.apache.hadoop.hbase.**client.HBaseAdmin.init(** HBaseAdmin.java:90) Call to org.apache.hadoop.hbase.**HBaseAdmin::HBaseAdmin failed! My app create HBaseAdmin every 30s,and the threads used by my app increases about 1thread/30s.See from the stack, there is only one HBaseAdmin in Memory, but lots of Configuration and HConnectionImplementation instances. I can see from the sources, everytime when HBaseAdmin is created, a new Configuration and HConnectionImplementation is created and added to HConnectionManager.HBASE_**INSTANCES.Sohttp://HConnectionManager.HBASE_INSTANCES.Sothey are not collected by gc when HBaseAdmin is collected. So i think we need to add a close methord to remove the Configuration**HConnectionImplementation from HConnectionManager.HBASE_**INSTANCES.Just as follows: public void close(){ HConnectionManager.**deleteConnection(**getConfiguration(), true); }
Re: TIMERANGE performance on uniformly distributed keyspace
Hi, For the filtering part, every HFile is associated to a set of meta data. This meta data includes the timerange. So if there is no overlap between the time range you want and the time range of the store, the HFile is totally skipped. This work is done in StoreScanner#selectScannersFrom Cheers, N. On Sat, Apr 14, 2012 at 5:11 PM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there- With respect to: * Does it need to hit every memstore and HFile to determine if there isdata available? And if so does it need to do a full scan of that file to determine the records qualifying to the timerange, since keys are stored lexicographically? And... Using scan 'table', {TIMERANGE = [t, t+x]} : See... http://hbase.apache.org/book.html#regions.arch 8.7.5.4. KeyValue The timestamp is an attribute of the KeyValue, but unless you perform a restriction using start/stop row it have to process every row. Major compactions don't change this fact, they just change the number of HFiles that have to get processed. On 4/14/12 10:38 AM, Rob Verkuylen r...@verkuylen.net wrote: I'm trying to find a definitive answer to the question if scans on timerange alone will scale when you use uniformly distributed keys like UUIDs. Since the keys are randomly generated that would mean the keys will be spread out over all RegionServers, Regions and HFiles. In theory, assuming enough writes, that would mean that every HFile will contain the entire timerange of writes. Now before a major compaction, data is in the memstores and (non max.filesize) flushedmerged HFiles. I can imagine that a scan using a TIMERANGE can quickly serve from memstores and the smaller files, but how does it perform after a major compaction? Using scan 'table', {TIMERANGE = [t, t+x]} : * How does HBase handle this query in this case(UUIDs)? * Does it need to hit every memstore and HFile to determine if there is data available? And if so does it need to do a full scan of that file to determine the records qualifying to the timerange, since keys are stored lexicographically? I've run some tests on 300+ region tables, on month old data(so after major compaction) and performance/response seems fairly quick. But I'm trying to understand why that is, because hitting every HFile on every region seems to be ineffective. Lars' book figure 9-3 seems to indicate this as well, but cant seem to get the answer from the book or anywhere else. Thnx, Rob
Re: Zookeeper available but no active master location found
Hi, Literally, it means that ZooKeeper is there but the hbase client can't find the hbase master address in it. By default, the node used is /hbase/master, and it contains the hostname and port of the master. You can check its content in ZK by doing a get /hbase/master in bin/zkCli.sh (see http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_ConnectingToZooKeeper ). There should be a root cause for this, so it worths looking for other error messages in the logs (master especially). N. On Fri, Apr 13, 2012 at 1:23 AM, Henri Pipe henri.p...@gmail.com wrote: client.HConnectionManager$HConnectionImplementation: ZooKeeper available but no active master location found Having a problem with master startup that I have not seen before. running the following packages: hadoop-hbase-0.90.4+49.137-1 hadoop-0.20-secondarynamenode-0.20.2+923.197-1 hadoop-hbase-thrift-0.90.4+49.137-1 hadoop-zookeeper-3.3.4+19.3-1 hadoop-0.20-datanode-0.20.2+923.197-1 hadoop-0.20-namenode-0.20.2+923.197-1 hadoop-0.20-tasktracker-0.20.2+923.197-1 hadoop-hbase-regionserver-0.90.4+49.137-1 hadoop-zookeeper-server-3.3.4+19.3-1 hadoop-0.20-0.20.2+923.197-1 hadoop-0.20-jobtracker-0.20.2+923.197-1 hadoop-hbase-master-0.90.4+49.137-1 [root@ip-10-251-27-130 logs]# java -version java version 1.6.0_31 Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) I start master and region server on another node. Master is initialized, but as soon as I try to check the master_status or do a zkdump via web interface, it blows up with: 2012-04-12 19:16:10,453 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZooKeeper available but no active master location found 2012-04-12 19:16:10,453 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: getMaster attempt 10 of 10 failed; retrying after sleep of 16000 I am running three zookeepers: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/mnt/zookeeper # The maximum number of zookeeper client connections maxClientCnxns=2000 # the port at which the clients will connect clientPort=2181 server.1=10.251.27.130:2888:3888 server.2=10.250.9.220:2888:3888 server.3=10.251.110.50:2888:3888 I can telnet to the zookeepers just fine. Here is my hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://namenode:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name value10.251.27.130,10.250.9.220,10.251.110.50/value /property property namehbase.zookeeper.property.dataDir/name value/hadoop/zookeeper/data/value /property property namehbase.zookeeper.property.maxClientCnxns/name value2000/value finaltrue/final /property /configuration Any thoughts? Any help is greatly appreciated. Thanks Henri Pipe
Re: can hbase-0.90.2 work with zookeeper-3.3.4?
Hi, It should. I haven't tested the .90, but I tested the hbase trunk a few month ago vs. ZK 3.4.x and ZK 3.3.x and it was working. N. 2012/4/5 lulynn_2008 lulynn_2...@163.com Hi, I found hbase-0.90.2 use zookeeper-3.4.2. Can this version hbase work with zookeeper-3.3.4? Thank you.
Re: Hbase RegionServer stalls on initialization
Then you should have an error in the master logs. If not, it worths checking that the master the region servers speak to the same ZK... As it's hbase related, I redirect the question to hbase user mailing list (hadoop common is in bcc). On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman nabib.elrah...@tubemogul.com wrote: The master is up. is it possible that zookeeper might not know about it? *Nabib El-Rahman *| Senior Sofware Engineer *M:* 734.846.25 734.846.2529 www.tubemogul.com | *twitter: @nabiber* http://www.tubemogul.com/ http://www.tubemogul.com/ On Mar 28, 2012, at 10:42 AM, N Keywal wrote: It must be waiting for the master. Have you launched the master? On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman nabib.elrah...@tubemogul.com wrote: Hi Guys, I'm starting up an region server and it stalls on initialization. I took a thread dump and found it hanging on this spot: regionserver60020 prio=10 tid=0x7fa90c5c4000 nid=0x4b50 in Object.wait() [0x7fa9101b4000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xbc63b2b8 (a org.apache.hadoop.hbase.MasterAddressTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:122) - locked 0xbc63b2b8 (a org.apache.hadoop.hbase.MasterAddressTracker) at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:516) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:493) at org.apache.hadoop.hbase.regionserver.HRegionServer.initialize(HRegionServer.java:461) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:560) at java.lang.Thread.run(Thread.java:662) Any Idea on who or what its being blocked on? *Nabib El-Rahman *| Senior Sofware Engineer *M:* 734.846.2529 www.tubemogul.com | *twitter: @nabiber* http://www.tubemogul.com/ http://www.tubemogul.com/
Re: HBase schema model question.
Hi, Just a few... See http://hbase.apache.org/book.html#number.of.cfs N. On Tue, Mar 20, 2012 at 12:39 PM, Manish Bhoge manishbh...@rocketmail.comwrote: Very basic question: How many column families possible in a table in Hbase? I know you can have thousand of columns in a family. But I don't know how many families can be possible. So far in example I haven't seen more than 1. Thanks Manish Sent from my BlackBerry, pls excuse typo
Re: Streaming data processing and hBase
Hi, The way you describe the in memory caching component, it looks very similar to HBase memstore. Any reason for not relying on it? N. On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: Dear all, We are currently working on an architecture for a system that should be serve as an archive for 1000+ measuring components that frequently (~30/s) send messages containing measurement values (~300 bytes/message). The archiving system should be capable of not only serving as a long term storage but also as a kind of streaming data processing and caching component. There are several functions that should be computed on the incoming data before finally storing it. We suggested an architecture that comprises of: A message routing component that could route data to calculations and route calculation results to other components that are interested in these data. An in memory caching component that is used for storing up to 10 - 20 minutes of data before it is written to the long term archive. An hBase database that is used for the long term storage. MapReduce framework for doing analytics on the data stored in the hBase database. The complete system should be failsafe and reliable regarding component failures and it should scale with the number of computers that are utilized. Are there any suggestions or feedback to this approach from the community? and are there any suggestions which tools or systems to use for the message routing component and the in memory cache. Thanks for any help and suggestions all the best Christian 8--- Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 Munich, Germany Tel.: +49 89 636-42722 Fax: +49 89 636-41423 mailto:christian.kleegr...@siemens.com Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard Cromme; Managing Board: Peter Loescher, Chairman, President and Chief Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322
Re: Streaming data processing and hBase
Hi Christian, It's a component internal to HBase, so you don't have to use it directly. See http://hbase.apache.org/book/wal.html on how writes are handled by HBase to ensure reliability data distribution... Cheers, N. On Fri, Mar 16, 2012 at 7:39 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: Hi Is this memstore replicated? Since we store a significant amount of data in the memory cache we need a replicated solution. Also I can't find lots of information besides a java api doc for the MemStore class. I will continue searching for this, but if you have any URL with more documentation please send it. Thanks in advance regards Christian 8-- Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 München, Deutschland Tel.: +49 89 636-42722 Fax: +49 89 636-41423 mailto:christian.kleegr...@siemens.com Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Peter Löscher, Vorsitzender; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen, Michael Süß; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322 -Ursprüngliche Nachricht- Von: N Keywal [mailto:nkey...@gmail.com] Gesendet: Freitag, 16. März 2012 18:02 An: user@hbase.apache.org Betreff: Re: Streaming data processing and hBase Hi, The way you describe the in memory caching component, it looks very similar to HBase memstore. Any reason for not relying on it? N. On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: Dear all, We are currently working on an architecture for a system that should be serve as an archive for 1000+ measuring components that frequently (~30/s) send messages containing measurement values (~300 bytes/message). The archiving system should be capable of not only serving as a long term storage but also as a kind of streaming data processing and caching component. There are several functions that should be computed on the incoming data before finally storing it. We suggested an architecture that comprises of: A message routing component that could route data to calculations and route calculation results to other components that are interested in these data. An in memory caching component that is used for storing up to 10 - 20 minutes of data before it is written to the long term archive. An hBase database that is used for the long term storage. MapReduce framework for doing analytics on the data stored in the hBase database. The complete system should be failsafe and reliable regarding component failures and it should scale with the number of computers that are utilized. Are there any suggestions or feedback to this approach from the community? and are there any suggestions which tools or systems to use for the message routing component and the in memory cache. Thanks for any help and suggestions all the best Christian 8--- Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 Munich, Germany Tel.: +49 89 636-42722 Fax: +49 89 636-41423 mailto:christian.kleegr...@siemens.com Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard Cromme; Managing Board: Peter Loescher, Chairman, President and Chief Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322
RE: Java Programming and Hbase
You will need the hadoop jar for this. Hbase uses hadoop for common stuff like the configuration you've seen, so even a simple client needs it. N. Le 12 mars 2012 12:06, Mahdi Negahi negahi.ma...@hotmail.com a écrit : Is it necessary to install hadoop for hbase, if want use Hbase in my laptop and use it via Java ? Date: Mon, 12 Mar 2012 10:43:44 +0100 Subject: Re: Java Programming and Hbase From: khi...@googlemail.com To: user@hbase.apache.org you also need to import hadoop.jar, since hbase runs on hahoop On Mon, Mar 12, 2012 at 9:45 AM, Mahdi Negahi negahi.ma...@hotmail.com wrote: Dear Friends I try to write a simple application with Java and manipulate my Hbase table. so I read this post and try to follow it. http://hbase.apache.org/docs/current/api/index.html I use eclipse and add hbase-092.0.jar as external jar file for my project. but i have problem in the first line of guideline. the following code line Configuration config = HBaseConfiguration.create(); has a following error The type org.apache.hadoop.conf.Configuration cannot be resolved. It is indirectly referenced from required .class files and Configuration's package that eclipse want to add to my project is import javax.security.auth.login.Configuration; i think it is not an appropriate package. please advice me and refer me to new guideline.
Re: Java Programming and Hbase
only jar files. They are already in the hbase distrib (i.e. if you download hbase, you get the hadoop jar files you need). You just need to import them in your IDE. On Mon, Mar 12, 2012 at 1:05 PM, Mahdi Negahi negahi.ma...@hotmail.comwrote: I so confused. I must install Hadoop or use only jar files ? Date: Mon, 12 Mar 2012 12:46:09 +0100 Subject: RE: Java Programming and Hbase From: nkey...@gmail.com To: user@hbase.apache.org You will need the hadoop jar for this. Hbase uses hadoop for common stuff like the configuration you've seen, so even a simple client needs it. N. Le 12 mars 2012 12:06, Mahdi Negahi negahi.ma...@hotmail.com a écrit : Is it necessary to install hadoop for hbase, if want use Hbase in my laptop and use it via Java ? Date: Mon, 12 Mar 2012 10:43:44 +0100 Subject: Re: Java Programming and Hbase From: khi...@googlemail.com To: user@hbase.apache.org you also need to import hadoop.jar, since hbase runs on hahoop On Mon, Mar 12, 2012 at 9:45 AM, Mahdi Negahi negahi.ma...@hotmail.com wrote: Dear Friends I try to write a simple application with Java and manipulate my Hbase table. so I read this post and try to follow it. http://hbase.apache.org/docs/current/api/index.html I use eclipse and add hbase-092.0.jar as external jar file for my project. but i have problem in the first line of guideline. the following code line Configuration config = HBaseConfiguration.create(); has a following error The type org.apache.hadoop.conf.Configuration cannot be resolved. It is indirectly referenced from required .class files and Configuration's package that eclipse want to add to my project is import javax.security.auth.login.Configuration; i think it is not an appropriate package. please advice me and refer me to new guideline.
Re: Retrieve Column Family and Column with Java API
Hi, Yes and no. No, because as a table can have millions of columns and these columns can be different for every row, the only way to get all the columns is to scan the whole table. Yes, because if you scan the table you can have the columns names. See Result#getMap: it's organized by family -- qualifier -- version -- value And yes, because you can get the column families from the HTableDescriptor. N. On Mon, Mar 12, 2012 at 3:10 PM, Mahdi Negahi negahi.ma...@hotmail.comwrote: Dear All friends Is there any way to retrieve a table's column families and columns with Java. for example, i want to scan a table that i know only its name.
Re: HBase-0.92.0 removed HBaseClusterTestCase, is there any replacement for this class
Hi, It's replaced by HBaseTestingUtility. Cheers, N. 2012/3/8 lulynn_2008 lulynn_2...@163.com Hi All, I am integrating flume-0.9.4 with hbase-0.92.0. And I find hbase-0.92.0 removed HBaseClusterTestCase which is used in flume-0.9.4. My question is: Is there any replacement for HBaseClusterTestCase? Thank you.
Re: 0.92 in mvn repository somewhere?
You cannot use the option -D*skipTests* ? On Wed, Feb 15, 2012 at 5:27 PM, Stack st...@duboce.net wrote: On Tue, Feb 14, 2012 at 11:18 PM, Ulrich Staudinger ustaudin...@activequant.com wrote: Hi St.Ack, i don't wanna be a pain in the back, but any progress on this? You are not being a pain. I'm fumbling the mvn publishing, repeatedly. Its a little embarrassing which is why I'm not talking to much about it (smile). To publish to maven, we need to build ~3 (perhaps 4 times). Each build takes ~two hours. They can fail on an odd flakey test. Also, maven release can fail w/ an error code 1 and thats all she wrote so I try a few things to try and get over the error code 1.. it doesn't always happen (then I restart the two hour build). I'm doing this task in background so I forget about it from time to time (until you email me above). I promise to doc all I do to get it up there this time. I half did it last time: http://hbase.apache.org/book.html#mvn_repo Also, our build gets more sane in next versions taking 1/4 time. Sorry its taking so long, St.Ack
Re: Is it possible to connect HBase remotely?
Hi, The client needs to connect to zookeeper as well. You haven't set the parameters for zookeeper, so it goes with the default settings (localhost/2181), hence the error you're seeing. Set the zookeeper connection property in the client, it should work. This should do it: conf .set(hbase.zookeeper.quorum, 192.168.2.122); conf .set(hbase.zookeeper.property.clientPort, 2181); Cheers, N. On Wed, Feb 8, 2012 at 3:26 PM, shashwat shriparv dwivedishash...@gmail.com wrote: I have two machine on same network IPs are like *192.168.2.122* and * 192.168.2.133*, suppose hbase (stand alone mode) running on *192.168.2.122, *and i have eclipse or netbeans running on *192.168.2.133,* so i need to retrieve and put data to hbase running on other ip, till now what i have tried is creating a configuration for hbase inside my code like : Configuration conf = HBaseConfiguration.create(); conf.set(hbase.master, *192.168.2.122:9000*); HTable hTable = new HTable(conf, table); java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) 12/02/08 19:44:28 INFO zookeeper.ClientCnxn: Opening socket connection to server* localhost/127.0.0.1:2181* 12/02/08 19:44:28 WARN zookeeper.ClientCnxn: Session 0x1355d44ae6f0003 for server null, unexpected error, closing socket connection and attempting reconnect I a not able to understand why its trying to go to *localhost/ 127.0.0.1:2181 .* * * My host file configuration is follows : == 127.0.0.1 localhost 127.0.0.1 ubuntu.ubuntu-domain ubuntu 192.168.2.126 ubuntu 192.168.2.125 ubuntu1 192.168.2.106 ubuntu2 192.168.2.56 ubuntu3 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters == I am able to telnet to localhost:9000, 127.0.0.1:9000, myhostname:9000, but if i am trying to connect to my ip which is 1982.168.2.125 its not connecting : its saying connection reffused. What method should follow to achieve this(connect to HBase running on another pc on the same network). any tutorial link will be appreciated.
zookeeper 3.3/3.4 on hbase trunk
Hi, FYI. I've been doing some tests mixing zookeeperclient/server versions onhbase trunk, by executing medium category unit tests with a standalone zookeeper server (Mixing versions 3.3 3.4 is officially supported by Zookeeper, but was worth checking) I tested: Zookeeper Server server 3.3.4 and 3.4.2 Zookeeper Client API 3.3.4 and 3.4.2 (with some changes in hbase to make it build with 3.3 API). Meaning; Client 3.4.2 -- Server 3.4.2 Client 3.3.4 -- Server 3.4.2 Client 3.3.4 -- Server 3.3.4 Client 3.4.2 -- Server 3.3.4 Conclusion: - It works, except of course if you're activating secure login (the related unit tests will hang). - I had a strange random error with the 3.3.4 server (whatever the client version), but it seems to be linked only to the start/stop phase (zookeeper server surviving to a stop request). - It's difficult from the client to know what's the zookeeper server version. A zookeeper jira was created for this (ZOOKEEPER-1381) - if you use a 3.4.2 feature like multi on a 3.3 server, it hangs: once again, it's up to the developer/administrator to make sure he's not using something specific to the 3.4 server, hence the jira 1381 if we want stuff like warnings or implementations optimized for a given server. Cheers, N.
Re: sequence number
Hi, Yes, each cell is associated to a long. By default it's a timestamps, but you can set it yourself when you create the put. It's stored everywhere. You've got a lot of information and links on this in the hbase book ( http://hbase.apache.org/book.html#versions) Cheers, N. On Mon, Jan 30, 2012 at 9:38 PM, Noureddine BOUYAHIAOUI nour.bouyahia...@free.fr wrote: Hi, In my reads about HBase, I understand that, The HRegionServer (n time HRegion) use sequence number ( AtomicLong ) to version each key/value stored in WAL. Please can you give me some details about this notion, for example how HRegionServer create his sequence number, and why we use it. Is't considered as version identifier? Best regards. Noureddine Bouyahiaoui
Re: How to implement tests for python based application using Hbase-thrift interface
Hi Damien, Can't say for the Python stuff. You can reuse or extract what you need in HBaseTestingUtility from the hbase test package, this will allow you to start a full Hbase mini cluster in a few lines of Java code. Cheers, N. On Mon, Jan 30, 2012 at 11:10 AM, Damien Hardy dha...@figarocms.fr wrote: Hello, I wrote some code in python using Hbase as image storage. I want my code to be tested independently of some external Hbase full architecture so my question is : Is there some howto helping on instantiate a temporary local minicluster + thrift interface in order to pass python (or maybe other language) hbase-thrift based tests easily. Cheers, -- Damien
Re: hbase heap size beyond 16G ?
If your're interested, some good slides on GC (slide 45 and after): http://www.azulsystems.com/sites/www.azulsystems.com/SpringOne2011_UnderstandingGC.pdf On Tue, Nov 8, 2011 at 11:25 PM, Mikael Sitruk mikael.sit...@gmail.comwrote: Concurrent GC (a.k.a CMS) does not mean that there is no more pause. The pauses are reduced to minimum but can still happen especially if the concurrent thread will not finish their work under high pressure. The G1 collector in JDK 7.0 pretends to be a better collector than CMS, but i presume tests will need to be done to validate this. BTW the CMS collector is the one that is recommented in the book. Mikael.S On Tue, Nov 8, 2011 at 11:57 PM, Sujee Maniyam su...@sujee.net wrote: HI All the HBase book by Lars warns it is not recommended to set heap size above 16G, because of 'stop the world' GC. Does this still apply? Specially with 'concurrentGC' ? thanks Sujee http://sujee.net -- Mikael.S