Re: Throttle replication speed in case of datanode failure
Since this is a Hadoop question, it should be sent user@hadoop.apache.org (which I'm now sending this to and I put user@hbase in BCC). J-D On Thu, Jan 17, 2013 at 9:54 AM, Brennon Church bren...@getjar.com wrote: Hello, Is there a way to throttle the speed at which under-replicated blocks are copied across a cluster? Either limiting the bandwidth or the number of blocks per time period would work. I'm currently running Hadoop v1.0.1. I think the dfs.namenode.replication.work.multiplier.per.iteration option would do the trick, but that is in v1.1.0 and higher. Thanks. --Brennon
Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)
Fifth offense. Yuan Jin is out of the office. - I will be out of the office starting 06/22/2012 and will not return until 06/25/2012. I am out of Jun 21 Yuan Jin is out of the office. - I will be out of the office starting 04/13/2012 and will not return until 04/16/2012. I am out of Apr 12 Yuan Jin is out of the office. - I will be out of the office starting 04/02/2012 and will not return until 04/05/2012. I am out of Apr 2 Yuan Jin is out of the office. - I will be out of the office starting 02/17/2012 and will not return until 02/20/2012. I am out of Feb 16 On Mon, Jul 23, 2012 at 1:09 PM, Yuan Jin jiny...@cn.ibm.com wrote: I am out of the office until 07/25/2012. I am out of office. For HAMSTER related things, you can contact Jason(Deng Peng Zhou/China/IBM) For CFM related things, you can contact Daniel(Liang SH Su/China/Contr/IBM) For TMB related things, you can contact Flora(Jun Ying Li/China/IBM) For TWB related things, you can contact Kim(Yuan SH Jin/China/IBM) For others, I will reply you when I am back. Note: This is an automated response to your message Reducer MapFileOutpuFormat sent on 24/07/2012 4:09:51. This is the only notification you will receive while this person is away.
Re: Hbase DeleteAll is not working
Please don't cross-post, your question is about HBase not MapReduce itself so I put mapreduce-user@ in BCC. 0.20.3 is, relatively to the age of the project, as old as my grand-mother so you should consider upgrading to 0.90 or 0.92 which are both pretty stable. I'm curious about the shell's behavior you are encountering. Would it be possible for you to show us the exact trace of what you are doing in the shell? To be clear, here's what I'd like to see: - A get of the row you want to delete. Feel free to zero out the values. - A deleteall of that row. - Another get of that row. - A delete of a column (that should work according to your email). - A last get of that row. Thx, J-D On Sun, May 13, 2012 at 9:57 PM, Mahesh Balija balijamahesh@gmail.com wrote: Hi, I am trying to delete the whole row from hbase in my production cluster in two ways, 1) I have written a mapreduce program to remove many rows which satisfy certain condition to do that, The key is the hbase row key only, and the value is Delete, I am initializing the delete object with the Key. Delete delete = new Delete(key.get()); context.write(key, delete); 2) From the command line I am trying to delete the selected record using deleteall command, Both of the are not working, i.e., none of the records are being deleted from the hbase, but if I separately delete the independent columns thru command line then the record is being deleted if I remove all the individual columns. My hbase version is, hbase-0.20.3 and my hadoop version is 0.20.2 Please suggest me whether I am doing anything wrong or is this know weird behavior of the hbase? Thanks, Mahesh.B.
Re: Doubt from the book Definitive Guide
On Thu, Apr 5, 2012 at 7:03 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Only advantage I was thinking of was that in some cases reducers might be able to take advantage of data locality and avoid multiple HTTP calls, no? Data is anyways written, so last merged file could go on HDFS instead of local disk. I am new to hadoop so just asking question to understand the rational behind using local disk for final output. So basically it's a tradeoff here, you get more replicas to copy from but you have 2 more copies to write. Considering that that data's very short lived and that it doesn't need to be replicated (since if the machine fails the maps are replayed anyway) it seems that writing 2 replicas that are potentially unused would be hurtful. Regarding locality, it might make sense on a small cluster but the more you add nodes the smaller the chance to have local replicas for each blocks of data you're looking for. J-D
Re: Fairscheduler - disable default pool
We do it here by setting this: poolMaxJobsDefault0/poolMaxJobsDefault So that you _must_ have a pool (that's configured with a different maxRunningJobs) in order to run jobs. Hope this helps, J-D On Tue, Mar 13, 2012 at 10:49 AM, Merto Mertek masmer...@gmail.com wrote: I know that by design all unmarked jobs goes to that pool, however I am doing some testing and I am interested if is possible to disable it.. Thanks
Re: Regarding Parrallel Iron's claim
Isn't that old news? http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/ Googling around, doesn't seem anything happened after that. J-D On Thu, Dec 8, 2011 at 6:52 PM, JS Jang jsja...@gmail.com wrote: Hi, Does anyone know any discussion in Apache Hadoop regarding the claim by Parrallel Iron with their patent against use of HDFS? Thanks in advance. Regards, JS
Re: Regarding Parrallel Iron's claim
You could just look at the archives: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/ It is also indexed by all search engines. J-D On Thu, Dec 8, 2011 at 7:44 PM, JS Jang jsja...@gmail.com wrote: I appreciate your help, J-D. Yes, I wondered whether there was any update since or previous discussion within Apache Hadoop as I am new in this mailing list. On 12/9/11 12:19 PM, Jean-Daniel Cryans wrote: Isn't that old news? http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/ Googling around, doesn't seem anything happened after that. J-D On Thu, Dec 8, 2011 at 6:52 PM, JS Jangjsja...@gmail.com wrote: Hi, Does anyone know any discussion in Apache Hadoop regarding the claim by Parrallel Iron with their patent against use of HDFS? Thanks in advance. Regards, JS -- 장정식 / jsj...@gruter.com (주)그루터, RD팀 수석 www.gruter.com Cloud, Search and Social
Re: Hadoop 0.21
Yep. J-D On Tue, Dec 6, 2011 at 10:41 AM, Saurabh Sehgal saurabh@gmail.com wrote: Hi All, According to the Hadoop release notes, version 0.21.0 should not be considered stable or suitable for production: 23 August, 2010: release 0.21.0 available This release contains many improvements, new features, bug fixes and optimizations. It has not undergone testing at scale and should not be considered stable or suitable for production. This release is being classified as a minor release, which means that it should be API compatible with 0.20.2. Is this still the case ? Thank you, Saurabh
Re: Version of Hadoop That Will Work With HBase?
For the record, this thread was started from another discussion in user@hbase. 0.20.205 does work with HBase 0.90.4, I think the OP was a little too quick saying it doesn't. J-D On Tue, Dec 6, 2011 at 11:44 AM, jcfol...@pureperfect.com wrote: Sadly, CDH3 is not an option although I wish it was. I need to get an official release of HBase from apache to work. I've tried every version of HBase 0.89 and up with 0.20.205 and all of them throw EOFExceptions. Which version of Hadoop core should I be using? HBase 0.94 ships with a 20-append version which doesn't work throws an EOFException, but when I tried replacing it with the hadoop-core included with hadoop 0.20.205 I still get the same exception. Thanks Original Message Subject: Re: Version of Hadoop That Will Work With HBase? From: Harsh J ha...@cloudera.com Date: Tue, December 06, 2011 2:32 pm To: common-user@hadoop.apache.org 0.20.205 should work, and so should CDH3 or 0.20-append branch builds (no longer maintained, after 0.20.205 replaced it though). What problem are you facing? Have you ensured HBase does not have a bad hadoop version jar in its lib/? On Wed, Dec 7, 2011 at 12:55 AM, jcfol...@pureperfect.com wrote: Hi, Can someone please tell me which versions of hadoop contain the 20-appender code and will work with HBase? According to the Hbase docs (http://hbase.apache.org/book/hadoop.html), Hadoop 0.20.205 should work with HBase but it does not appear to. Thanks! -- Harsh J
Re: Adjusting column value size.
(BCC'd common-user@ since this seems strictly HBase related) Interesting question... And you probably need all those ints at the same time right? No streaming? I'll assume no. So the second solution seems better due to the overhead of storing each cell. Basically, storing one int per cell you would end up storing more keys than values (size wise). Another thing is that if you pack enough ints together and there's some sort of repetition, you might be able to use LZO compression on that table. I'd love to hear about your experimentations once you've done them. J-D On Mon, Oct 3, 2011 at 10:58 PM, edward choi mp2...@gmail.com wrote: Hi, I have a question regarding the performance and column value size. I need to store per row several million integers. (Several million is important here) I was wondering which method would be more beneficial performance wise. 1) Store each integer to a single column so that when a row is called, several million columns will also be called. And the user would map each column values to some kind of container (ex: vector, arrayList) 2) Store, for example, a thousand integers into a single column (by concatenating them) so that when a row is called, only several thousand columns will be called along. The user would have to split the column value into 4 bytes and map the split integer to some kind of container (ex: vector, arrayList) I am curious which approach would be better. 1) would call several millions of columns but no additional process is needed. 2) would call only several thousands of columns but additional process is needed. Any advice would be appreciated. Ed
Re: Using HBase for real time transaction
On Wed, Sep 21, 2011 at 8:36 AM, Jignesh Patel jign...@websoft.com wrote: I am not looking for relational database. But looking creating multi tenant database, now at this time I am not sure whether it needs transactions or not and even that kind of architecture can support transactions. Currently in HBase nothing prevents you from having multiple tenants, as long as they have different table names. Also keep in mind that there's no security implemented, but it *might* make it for 0.92 (crossing fingers). Row mutations in HBase are seen by the user as soon as they are done, atomicity is guaranteed at the row level, which seems to satisfy his requirement. If multi-row transactions are needed then I agree HBase might not be what he wants. Can't we handle transaction through application or container, before data even goes to HBase? Sure, you could do something like what Megastore[1] does, but you really need to evaluate your needs and see if that works. And I do have one more doubt, how to handle low read latency? HBase offers that out of the box, a more precise question would be what 99th percentile read latency you need. Just for the sake of giving a data point, right now our 99p is 20ms but that's with our type of workload, machines, front end caching, etc, so YYMV. J-D 1. Megastore (transactions are described in chapter 3.3): http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: Using HBase for real time transaction
While HBase isn't ACID-compliant, it does have have some guarantees: http://hbase.apache.org/acid-semantics.html J-D On Tue, Sep 20, 2011 at 2:56 PM, Michael Segel michael_se...@hotmail.com wrote: Since Tom isn't technical... ;-) The short answer is No. HBase is not capable of being a transactional because it doesn't support transactions. Nor is HBase ACID compliant. Having said that, yes you can use HBase to serve data in real time. HTH -Mike Subject: Re: Using HBase for real time transaction From: jign...@websoft.com Date: Tue, 20 Sep 2011 17:25:17 -0400 To: common-user@hadoop.apache.org Tom, Let me reword: can HBase be used as a transactional database(i.e. in replacement of mysql)? The requirement is to have real time read and write operations. I mean as soon as data is written the user should see the data(Here data should be written in Hbase). -Jignesh On Sep 20, 2011, at 5:11 PM, Tom Deutsch wrote: Real-time means different things to different people. Can you share your latency requirements from the time the data is generated to when it needs to be consumed, or how you are thinking of using Hbase in the overall flow? Tom Deutsch Program Director CTO Office: Information Management Hadoop Product Manager / Customer Exec IBM 3565 Harbor Blvd Costa Mesa, CA 92626-1420 tdeut...@us.ibm.com Jignesh Patel jign...@websoft.com 09/20/2011 12:57 PM Please respond to common-user@hadoop.apache.org To common-user@hadoop.apache.org cc Subject Using HBase for real time transaction We are exploring possibility of using HBase for the real time transactions. Is that possible? -Jignesh
Re: Using HBase for real time transaction
I think there has to be some clarification. The OP was asking about a mySQL replacement. HBase will never be a RDBMS replacement. No Transactions means no way of doing OLTP. Its the wrong tool for that type of work. Agreed, if you are looking to handle relational data in a relational fashion, might be better to look elsewhere Recognize what HBase is and what it is not. Not sure what you're referring to here. This doesn't mean you can't take in or deliver data in real time, it can. So if you want to use it in a real time manner, sure. Note that like with other databases, you will have to do some work to handle real time data. I guess you would have to provide a specific use case on what you want to achieve in order to know if its a good fit. He says: The requirement is to have real time read and write operations. I mean as soon as data is written the user should see the data(Here data should be written in Hbase). Row mutations in HBase are seen by the user as soon as they are done, atomicity is guaranteed at the row level, which seems to satisfy his requirement. If multi-row transactions are needed then I agree HBase might not be what he wants. J-D
Re: Regarding design of HDFS
In order to have an answer to that sort of question, you first must prove that you did your own homework eg write down what you think the answer is based on your observations and readings, then I'm sure someone will be happy to help you. J-D On Thu, Aug 25, 2011 at 1:04 AM, Sesha Kumar sesha...@gmail.com wrote: Hi all, I am trying to get a good understanding of how Hadoop works, for my undergraduate project. I have the following questions/doubts : 1. Why does namenode store the blockmap (block to datanode mapping) in the main memory for all the files, even those that are not used? 2. Why cant namenode move out a part of the blockmap from main memory to a secondary storage device, when free space in main memory becomes scarce ( due to large number of files) ? 3. Why cant the blockmap be constructed when a file is requested (by a client) and then be cached for later accesses?
Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?
Hey Tim, It looks like you are running with only 1 replica so my first guess is that you only have 1 datanode and it's writing to /tmp, which was cleaned at some point. J-D On Tue, May 17, 2011 at 5:13 PM, Time Less timelessn...@gmail.com wrote: I loaded data into HDFS last week, and this morning I was greeted with this on the web interface: WARNING : There are about 32 missing blocks. Please check the log or run fsck. I ran fsck and see several missing and corrupt blocks. The output is verbose, so here's a small sample: /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar: CORRUPT block blk_-5745991833770623132 /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar: MISSING 1 blocks of total size 2945889 B /user/hive/warehouse/player_game_stat/2011-01-15/datafile: CORRUPT block blk_1642129438978395720 /user/hive/warehouse/player_game_stat/2011-01-15/datafile: MISSING 1 blocks of total size 67108864 B Sometimes the number of dots after the B is quite large (several lines long). Some of these are tmp files, but many are important. If this cluster were prod, I'd have some splaining to do. I need to determine what caused this corruption. Questions: What are the dots after the B? What is the significance of the number of them? Does anyone have suggestions where to start? Are there typical misconfigurations or issues that cause corruption missing files? What is the log that the NameNode web interface is refers to? Thanks for any infos! I'm... nervous. :) -- Tim Ellis Riot Games
Re: distcp problems going from hadoop-0.20.1 to -0.20.2
Errr really? Well shipping a bunch of hard drives should be faster. J-D On Apr 23, 2011 12:17 AM, Jonathan Disher jdis...@parad.net wrote: Aha, that works. Any ideas what kind of throughput I can expect, or suggestions for making this run as fast as possible? Obviously exact numbers will depend on cluster config, I won't bore you with the details, but... 10mbit? 100mbit? A gigabit? I've got about 112TB of data to move from the East coast to the West coast, and sooner would be better than later :) -j On Apr 22, 2011, at 10:38 PM, Jean-Daniel Cryans wrote: See Copying between versions of HDFS: http://hadoop.apache.org/common/docs/r0.20.2/distcp.html#cpver J-D On Fri, Apr 22, 2011 at 10:37 PM, Jonathan Disher jdis...@parad.net wrote: I have an existing cluster running hadoop-0.20.1, and I am migrating most of the data to a new cluster running -0.20.2. I am seeing this in the namenode logs when I try to run a distcp: @40004db263bf29c77134 WARN ipc.Server: Incorrect header or version mismatch from newNN:46111 got version 4 expected version 3 2011-04-23 05:30:55,999 WARN org.apache.hadoop.ipc.Server: Incorrect header or version mismatch from oldNN:48750 got version 3 expected version 4 When I run my distcp, on either side, it dies with a java.io.IOException/java.io.EOFException. Ideas? Am I screwed? I really don't want to drop my new cluster down to 0.20.1. -j
Re: HDFS + ZooKeeper
This is a 1M$ question. You could start thinking about this problem by looking at what AvatarNode does: https://issues.apache.org/jira/browse/HDFS-976 J-D On Fri, Apr 22, 2011 at 10:17 PM, Ozcan ILIKHAN ilik...@cs.wisc.edu wrote: Hi, Does anyone have any idea about how we can use HDFS with ZooKeeper? More elaborately if NameNode fails DataNodes should be able to retrieve address of new NameNode from ZooKeeper. Thanks, - Ozcan ILIKHAN PhD Student, Graduate Research Assistant Department of Computer Sciences University of Wisconsin-Madison http://pages.cs.wisc.edu/~ilikhan
Re: Hadoop in Canada
(moving to general@ since this is not a question regarding the usage of the hadoop commons, which I BCC'd) I moved from Montreal to SF a year and a half ago because I saw two things 1) companies weren't interested (they are still trying to get rid of COBOL or worse) or didn't have the data to use Hadoop (not enough big companies) and 2) the universities were either uninterested or just amused by this new comer. I know of one company that really does cool stuff with Hadoop in Montreal and it's Hopper (www.hopper.travel, they are still in closed alpha AFAIK) who also organized hackreduce.org last weekend. This is what their CEO has to say to the question Is there something you would do differently now if you would start it over?: Move to the Valley. (see the rest here http://nextmontreal.com/product-market-fit-hopper-travel-fred-lalonde/) I'm sure there are a lot of other companies that are either considering using or already using Hadoop to some extent in Canada but, like anything else, only a portion of them are interested in talking about it or even organizing an event. I would actually love to see something getting organized and I'd be on the first plane to Y**, but I'm afraid that to achieve any sort of critical mass you'd have to fly in people from all the provinces. Air Canada becomes a SPOF :P Now that I think about it, there's probably enough Canucks around here that use Hadoop that we could have our own little user group. If you want to have a nice vacation and geek out with us, feel free to stop by and say hi. /rant J-D On Tue, Mar 29, 2011 at 6:21 AM, James Seigel ja...@tynt.com wrote: Hello, You might remember me from a couple of weeks back asking if there were any Calgary people interested in a “meetup” about #bigdata or using hadoop. Well, I’ve expanded my search a little to see if any of my Canadian brothers and sisters are using the elephant for good or for evil. It might be harder to grab coffee, but it would be fun to see where everyone is. Shout out if you’d like or ping me, I think it’d be fun to chat! Cheers James Seigel Captain Hammer at Tynt.com
Re: google snappy
(Please don't cross-post like that, it only adds confusion. I put everything in bcc and posted to general instead) Their README says the following: Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable compression ratios. Somebody obviously needs to publish some benchmarks, but knowing Snappy's origin I can believe that claim. Relevant jiras: HADOOP-7206 Integrate Snappy compression HBASE-3691 Add compressor support for 'snappy', google's compressor J-D On Wed, Mar 23, 2011 at 9:52 AM, Weishung Chung weish...@gmail.com wrote: Hey my fellow hadoop/hbase developers, I just came across this google compression/decompression package yesterday, could we make a good use of this compression scheme in hadoop? It's written in C++ though. http://code.google.com/p/snappy/ http://code.google.com/p/snappy/I haven't looked close into this snappy package yet but i would love to know about the differences compared to LZO. Thank you, Wei Shung
Re: mapreduce streaming with hbase as a source
(moving to the hbase user ML) I think streaming used to work correctly in hbase 0.19 since the RowResult class was giving the value (which you had to parse out), but now that Result is made of KeyValue and they don't include the values in toString then I don't see how TableInputFormat could be used. You could write your own InputFormat that wraps around TIF that returns a specific format for each cell tho. Hope that somehow helps, J-D 2011/2/19 Ondrej Holecek ond...@holecek.eu: I don't think you understand me correctly, I get this line: 72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1, row1/family1:b/1298037744658/Put/vlen=1, row1/family1:c/1298037748020/Put/vlen=1} I know 72 6f 77 31 is the key and the rest is value, let's call it mapreduce-value. In this mapreduce-value there is row1/family1:a/1298037737154/Put/vlen=1 that is hbase-row name, hbase-column name and hbase-timestamp. But I expect also hbase-value. So my question is what to do to make TableInputFormat to send also this hbase-value. Ondrej On 02/19/11 16:41, ShengChang Gu wrote: By default, the prefix of a line up to the first tab character is the key and the rest of the line (excluding the tab character) will be the value. If there is no tab character in the line, then entire line is considered as key and the value is null. However, this can be customized, Use: -D stream.map.output.field.separator=. -D stream.num.map.output.key.fields=4 2011/2/19 Ondrej Holecek ond...@holecek.eu mailto:ond...@holecek.eu Thank you, I've spend a lot of time with debuging but didn't notice this typo :( Now it works, but I don't understand one thing: On stdin I get this: 72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1, row1/family1:b/1298037744658/Put/vlen=1, row1/family1:c/1298037748020/Put/vlen=1} 72 6f 77 32 keyvalues={row2/family1:a/1298037755440/Put/vlen=2, row2/family1:b/1298037758241/Put/vlen=2, row2/family1:c/1298037761198/Put/vlen=2} 72 6f 77 33 keyvalues={row3/family1:a/1298037767127/Put/vlen=3, row3/family1:b/1298037770111/Put/vlen=3, row3/family1:c/1298037774954/Put/vlen=3} I see there is everything but value. What should I do to get value on stdin too? Ondrej On 02/18/11 20:01, Jean-Daniel Cryans wrote: You have a typo, it's hbase.mapred.tablecolumns not hbase.mapred.tablecolumn J-D On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek ond...@holecek.eu mailto:ond...@holecek.eu wrote: Hello, I'm testing hadoop and hbase, I can run mapreduce streaming or pipes jobs agains text files on hadoop, but I have a problem when I try to run the same job against hbase table. The table looks like this: hbase(main):015:0 scan 'table1' ROWCOLUMN+CELL row1 column=family1:a, timestamp=1298037737154, value=1 row1 column=family1:b, timestamp=1298037744658, value=2 row1 column=family1:c, timestamp=1298037748020, value=3 row2 column=family1:a, timestamp=1298037755440, value=11 row2 column=family1:b, timestamp=1298037758241, value=22 row2 column=family1:c, timestamp=1298037761198, value=33 row3 column=family1:a, timestamp=1298037767127, value=111 row3 column=family1:b, timestamp=1298037770111, value=222 row3 column=family1:c, timestamp=1298037774954, value=333 3 row(s) in 0.0240 seconds And command I use, with the exception I get: # hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D hbase.mapred.tablecolumn=family1: -input table1 -output /mtestout45 -mapper test-map -numReduceTasks 1 -reducer test-reduce -inputformat org.apache.hadoop.hbase.mapred.TableInputFormat packageJobJar: [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] [] /tmp/streamjob8218197708173702571.jar tmpDir=null 11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035 http://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035 Exception in thread main java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117
Re: mapreduce streaming with hbase as a source
You have a typo, it's hbase.mapred.tablecolumns not hbase.mapred.tablecolumn J-D On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek ond...@holecek.eu wrote: Hello, I'm testing hadoop and hbase, I can run mapreduce streaming or pipes jobs agains text files on hadoop, but I have a problem when I try to run the same job against hbase table. The table looks like this: hbase(main):015:0 scan 'table1' ROW COLUMN+CELL row1 column=family1:a, timestamp=1298037737154, value=1 row1 column=family1:b, timestamp=1298037744658, value=2 row1 column=family1:c, timestamp=1298037748020, value=3 row2 column=family1:a, timestamp=1298037755440, value=11 row2 column=family1:b, timestamp=1298037758241, value=22 row2 column=family1:c, timestamp=1298037761198, value=33 row3 column=family1:a, timestamp=1298037767127, value=111 row3 column=family1:b, timestamp=1298037770111, value=222 row3 column=family1:c, timestamp=1298037774954, value=333 3 row(s) in 0.0240 seconds And command I use, with the exception I get: # hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D hbase.mapred.tablecolumn=family1: -input table1 -output /mtestout45 -mapper test-map -numReduceTasks 1 -reducer test-reduce -inputformat org.apache.hadoop.hbase.mapred.TableInputFormat packageJobJar: [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] [] /tmp/streamjob8218197708173702571.jar tmpDir=null 11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035 Exception in thread main java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767) at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 23 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51) ... 28 more Can anyone tell me what I am doing wrong? Regards, Ondrej
Re: HBase crashes when one server goes down
Please use the hbase mailing list for HBase-related questions. Regarding your issue, we'll need more information to help you out. Haven you checked the logs? If you see exceptions in there, did you google them trying to figure out what's going on? Finally, does your setup meet all the requirements? http://hbase.apache.org/notsoquick.html#requirements J-D On Mon, Feb 14, 2011 at 9:49 AM, Rodrigo Barreto rodbarr...@gmail.com wrote: Hi, We are new with Hadoop, we have just configured a cluster with 3 servers and everything is working ok except when one server goes down, the Hadoop / HDFS continues working but the HBase stops, the queries does not return results until we restart the HBase. The HBase configuration is copied bellow, please help us. ## HBASE-SITE.XML ### configuration property namehbase.zookeeper.quorum/name valuemaster,slave1,slave2/value descriptionThe directory shared by region servers. /description /property property namehbase.rootdir/name valuehdfs://master:54310/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.master/name valuemaster:6/value descriptionThe host and port that the HBase master runs at. /description /property property namedfs.replication/name value2/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration Thanks, Rodrigo Barreto.
Re: User History Location
For cloudera-related questions, please use their mailing lists. J-D 2011/2/11 Alexander Schätzle schae...@informatik.uni-freiburg.de: Hello, I'm a little bit confused about the right key for specifying the User History Location in CDH3B3 (which is Hadoop 0.20.2+737). Could anybody please give me a short answer which key is the right one and which configuration file is the right one to place the key? 1) mapreduce.job.userhistorylocation ? 2) hadoop.job.history.user.location ? Is the mapred-site.xml the right config-file for this key? Thx a lot! Best regards, Alexander Schätzle University of Freiburg, Germany
Re: Single Job to put Data into Hbase+MySQL
Do both insertions in your reducer by either not using the output formats at all or use one of them and do the other insert by hand. J-D On Wed, Oct 27, 2010 at 1:44 PM, Shuja Rehman shujamug...@gmail.com wrote: Hi Folks I am wondering if anyone has the answer of this question. I am processing log files using Map reduce and get data to put some part into mysql and rest of hbase. At the moment, i am running two separate jobs to do this so reading single file for 2 times to dump the data. My questions is that can it be possible that I run single job to achieve it?? -- Regards Shuja-ur-Rehman Baig http://pk.linkedin.com/in/shujamughal Cell: +92 3214207445
Client hanging 20 seconds after job's over (WAS: Re: Can I run HBase 0.20.6 on Hadoop 0.21?)
(adding mapreduce-user@ and re-scoping title) Can you jstack the client while it's waiting 20 seconds? Is it still waiting for the job to come back or it's something else? Is the job itself done cleaning 20 seconds before the call returns on the client side (check the web ui)? J-D On Mon, Sep 27, 2010 at 12:10 PM, Pete Tyler peteralanty...@gmail.com wrote: Thanks for the offer, much appreciated I have a very simple mapreduce job on a pseudo distributed system. I have a very small amount of persisted data. Running locally the mapreduce job runs very quickly, less than three seconds. When I run the job against the pseudo distributed hadoop, still on the same machine, as the client then I see the following, - the map and reduce classes run very quickly, a matter of mills in total ... sweet - the client, blocks waiting for the job to finish for about 20 seconds ... very slow I'm trying to understand why I have this 20 second overhead and what I can do about it. My map and reduce classes are in my Hadoop classpath. On Sep 27, 2010, at 11:32 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: Using 0.21.0 may reveal newer bugs rather than fixing your older ones. Maybe we can help you debugging 0.20.2, what are you seeing? J-D
Re: State of high availability in Hadoop 0.20.1
It's the same. J-D On Thu, Jun 24, 2010 at 9:44 AM, Stas Oskin stas.os...@gmail.com wrote: Just to clarify, I mean the NameNode high availability. Regards. On Thu, Jun 24, 2010 at 7:43 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. What is the state of high-availability in Hadoop 0.20.1? In Hadoop 0.18.3 the only option was doing DBRD, has anything changed in 0.20.1? Regards.
Re: State of high availability in Hadoop 0.20.1
The Backup Namenode will be in 0.21 but it's not a complete NN HA solution (far from that): https://issues.apache.org/jira/browse/HADOOP-4539 Dhruba at Facebook has a AvatarNode for 0.20: https://issues.apache.org/jira/browse/HDFS-976 And the umbrella issue for NN availability is: https://issues.apache.org/jira/browse/HDFS-1064 J-D On Thu, Jun 24, 2010 at 10:10 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. The check-point node is expected to be included in 0.21? Regards. On Thu, Jun 24, 2010 at 7:47 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: It's the same. J-D On Thu, Jun 24, 2010 at 9:44 AM, Stas Oskin stas.os...@gmail.com wrote: Just to clarify, I mean the NameNode high availability. Regards. On Thu, Jun 24, 2010 at 7:43 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. What is the state of high-availability in Hadoop 0.20.1? In Hadoop 0.18.3 the only option was doing DBRD, has anything changed in 0.20.1? Regards.
Re: Error opening job jar
This isn't a HBase question, this is for mapreduce-user@hadoop.apache.org J-D On Tue, Jun 15, 2010 at 8:21 AM, yshintre1982 yshintre1...@yahoo.in wrote: i am running wordcount example on linux vmware on hadoop. i get the following exception Exception in thread main java.io.IOException: Error opening job jar: /usr/yogesh/wordcount.jar at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(Unknown Source) at java.util.jar.JarFile.init(Unknown Source) at java.util.jar.JarFile.init(Unknown Source) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) what would be wrong, plz help... -- View this message in context: http://old.nabble.com/Error-opening-job-jar-tp28892690p28892690.html Sent from the HBase User mailing list archive at Nabble.com.
Re: HBase client hangs after upgrade to 0.20.4 when used from reducer
info to pastebin. I did the following sequence (with HBase 0.20.4): - startup HBase (waited for all the regions to come online and let it settle) - startup our application - wait for the importer job to hang (it only happened on the second run, which started 15 reducers; the first run was really small and only one key was generated, so just one reducer) - kill the hanging importer job (hadoop job -kill) - try to shutdown HBase (as I type this it is still producing dots on my console) The HBase master logs are here (includes shutdown attempt): http://pastebin.com/PYpPVcyK The jstacks are here: - HMaster: http://pastebin.com/Da6jCAuA (this includes two thread dumps, one during operation with the hanging clients and one during hanging shutdown) - RegionServer 1: http://pastebin.com/5dQXfxCn - RegionServer 2: http://pastebin.com/XWwBGXYC - RegionServer 3: http://pastebin.com/mDgWbYGV - RegionServer 4: http://pastebin.com/XDR14bth As you can see in the master logs, the shutdown cannot get a thread called Thread-10 to stop running. The trace for that thread looks like this: Thread-10 prio=10 tid=0x4d218800 nid=0x1e73 in Object.wait() [0x427a7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab364c9d0 (a java.lang.Object) at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:89) - locked 0x2aaab364c9d0 (a java.lang.Object) at org.apache.hadoop.hbase.Chore.run(Chore.java:76) I still have no clue what happened, but I will investigate a bit more tomorrow. Thanks for the responses. Friso On May 12, 2010, at 9:02 PM, Todd Lipcon wrote: Hi Friso, Also, if you can capture a jstack of the regionservers at thie time that would be great. -Todd On Wed, May 12, 2010 at 9:26 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: Friso, Unfortunately it's hard to determine the cause with the provided information, the client call you pasted is pretty much normal i.e. the client is waiting to receive a result from a region server. The fact that you can't shut down the master when this happens is very concerning. Do you still have those logs around? Same for the region servers? Can you post this in pastebin or on a web server? Also, feel free to come chat with us on IRC, it's always easier to debug when live. #hbase on freenode J-D On Wed, May 12, 2010 at 8:31 AM, Friso van Vollenhoven fvanvollenho...@xebia.com wrote: Hi all, I am using Hadoop (0.20.2) and HBase to periodically import data (every 15 minutes). There are a number of import processes, but generally they all create a sequence file on HDFS, which is then run through a MapReduce job. The MapReduce uses the identity mapper (the input file is a Hadoop sequence file) and a specialized reducer that does the following: - Combine the values for a key into one value - Do a Get from HBase to retrieve existing values for the same key - Combine the existing value from HBase and the new one into one value again - Put the final value into HBase under the same key (thus 'overwrite' the existing row; I keep only one version) After I upgraded HBase to the 0.20.4 release, the reducers sometimes start hanging on a Get. When the jobs start, some reducers run to completion fine, but after a while the last reducers will start to hang. Eventually the reducers are killed of by Hadoop (after 600 secs). I did a thread dump for one of the hanging reducers. It looks like this: main prio=10 tid=0x48083800 nid=0x4c93 in Object.wait() [0x420ca000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2eb50d70 (a org.apache.hadoop.hbase.ipc.HBaseClient$Call) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721) - locked 0x2eb50d70 (a org.apache.hadoop.hbase.ipc.HBaseClient$Call) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333) at $Proxy2.get(Unknown Source) at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:450) at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:448) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:447) at net.ripe.inrdb.hbase.accessor.real.HBaseTableAccessor.get(HBaseTableAccessor.java:36) at net.ripe.inrdb.hbase.store.HBaseStoreUpdater.getExistingRecords(HBaseStoreUpdater.java:101) at net.ripe.inrdb.hbase.store.HBaseStoreUpdater.mergeTimelinesWithExistingRecords(HBaseStoreUpdater.java:60
Re: Enabling Indexing in HBase
Yes, you can also create a HBaseConfiguration object and configure it with those exact configs (that you then provide to HTable). J-D On Wed, May 12, 2010 at 1:22 AM, Michelan Arendse miche...@addynamo.com wrote: Thank you. I have added the configuration folder to my client class path and it worked. Now I am faced with another issue, since this application will be used in ColdFusion is there a way of making this work without having the configuration as part of the class path? -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: 11 May 2010 06:26 PM To: hbase-user@hadoop.apache.org Subject: Re: Enabling Indexing in HBase Per http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview your client has to know where your zookeeper setup is. Since you want to use HBase in a distributed fashion, that means you went through http://hadoop.apache.org/hbase/docs/r0.20.4/api/overview-summary.html#fully-distrib and this is where the required configs are. It could be made more obvious tho. J-D On Tue, May 11, 2010 at 4:44 AM, Michelan Arendse miche...@addynamo.com wrote: Thanks. I have added that to the class path, but I still get an error. This is the error that I get: 10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=6 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@12d15a9 10/05/11 13:41:27 INFO zookeeper.ClientCnxn: Attempting connection to server localhost/127.0.0.1:2181 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@b0ce8f java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933) 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Ignoring exception during shutdown input I'm working of a server and not standalone mode, where would I change a setting that tells the connectString to point to the server instead of localhost. -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: 10 May 2010 07:05 PM To: hbase-user@hadoop.apache.org Subject: Re: Enabling Indexing in HBase Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in your class path? J-D On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse miche...@addynamo.com wrote: Hi. I added the following properties to hbase-site.xml property namehbase.regionserver.class/name valueorg.apache.hadoop.hbase.ipc.IndexedRegionInterface/value /property property namehbase.regionserver.impl/name value org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer /value /property I'm using hbase 0.20.3 and when I start hbase now it comes with the following: ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master java.lang.UnsupportedOperationException: Unable to find region server interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.ipc.IndexedRegionInterface Can you please help with this problem that I am having. Thank you, Michelan Arendse Junior Developer | AD:DYNAMO // happy business ;-) Office 0861 Dynamo (0861 396266) | Fax +27 (0) 21 465 2587 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com http://www.addynamo.com
Re: Enabling Indexing in HBase
Per http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview your client has to know where your zookeeper setup is. Since you want to use HBase in a distributed fashion, that means you went through http://hadoop.apache.org/hbase/docs/r0.20.4/api/overview-summary.html#fully-distrib and this is where the required configs are. It could be made more obvious tho. J-D On Tue, May 11, 2010 at 4:44 AM, Michelan Arendse miche...@addynamo.com wrote: Thanks. I have added that to the class path, but I still get an error. This is the error that I get: 10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=6 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@12d15a9 10/05/11 13:41:27 INFO zookeeper.ClientCnxn: Attempting connection to server localhost/127.0.0.1:2181 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@b0ce8f java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933) 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Ignoring exception during shutdown input I'm working of a server and not standalone mode, where would I change a setting that tells the connectString to point to the server instead of localhost. -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: 10 May 2010 07:05 PM To: hbase-user@hadoop.apache.org Subject: Re: Enabling Indexing in HBase Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in your class path? J-D On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse miche...@addynamo.com wrote: Hi. I added the following properties to hbase-site.xml property namehbase.regionserver.class/name valueorg.apache.hadoop.hbase.ipc.IndexedRegionInterface/value /property property namehbase.regionserver.impl/name value org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer /value /property I'm using hbase 0.20.3 and when I start hbase now it comes with the following: ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master java.lang.UnsupportedOperationException: Unable to find region server interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.ipc.IndexedRegionInterface Can you please help with this problem that I am having. Thank you, Michelan Arendse Junior Developer | AD:DYNAMO // happy business ;-) Office 0861 Dynamo (0861 396266) | Fax +27 (0) 21 465 2587 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com http://www.addynamo.com
Re: Deprecated Table Map in Hbase-0.20.3
Two things: First TableMap was using the raw type instead of a generic one, this was fixed in https://issues.apache.org/jira/browse/HBASE-876 Then it wasn't generic enough, so this was filed https://issues.apache.org/jira/browse/HBASE-1725 That's the explanation. I remember having the same issue when I migrated my code to 0.20, but it's nothing you can't resolve, just inspect the compilation error messages and you'll figure it out. J-D On Mon, May 10, 2010 at 3:28 AM, bharath v bharathvissapragada1...@gmail.com wrote: Hey folks , I have a small question regarding TableMap class. I know it is deprecated in 0.20.3 , But the declaration was changed from public interface TableMapK extends WritableComparable, V extends Writable extends MapperImmutableBytesWritable, RowResult, K, V TO public interface TableMapK extends WritableComparable? super K, V extends Writable extends MapperImmutableBytesWritable, RowResult, K, V { Why is there an additional restriction on K that ? super K . Because of this my app written for 0.19.3 isn't getting compiled now . Any suggestions or comments? Thanks
Re: Enabling Indexing in HBase
Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in your class path? J-D On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse miche...@addynamo.com wrote: Hi. I added the following properties to hbase-site.xml property namehbase.regionserver.class/name valueorg.apache.hadoop.hbase.ipc.IndexedRegionInterface/value /property property namehbase.regionserver.impl/name value org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer /value /property I'm using hbase 0.20.3 and when I start hbase now it comes with the following: ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master java.lang.UnsupportedOperationException: Unable to find region server interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.ipc.IndexedRegionInterface Can you please help with this problem that I am having. Thank you, Michelan Arendse Junior Developer | AD:DYNAMO // happy business ;-) Office 0861 Dynamo (0861 396266) | Fax +27 (0) 21 465 2587 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com http://www.addynamo.com
Re: Got some question for begin HBase (KeyValue, data structure)
Inline. J-D 1. How can i get the key name for KeyValue? I use Bytes.toString(KeyValue.getKey()) cannot got any return. The javadoc of this method says: * Do not use unless you have to. Used internally for compacting and testing. The row key is given by http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Result.html#getRow() 2. Usually what value are u set for rowid? UUIDs or composite keys like timestamp+some_other_tags 3. How are you deploy the data structure from development server to production server? Copy over the DDL used in the shell. I think i need some information or document for how to design the data structure on HBase. can you share for me? Google's Bigtable paper is always a good resource. The wiki has some tips (check the website). Else you can search this mailing list WRT your specific data model, you'll probably find what you need. Thanks Regards, Singo
Re: Searching rows for array of key values
If your row keys are sorted in a lexicographical way (padded with zeroes in your case since it's longs) then simply use a scanner: http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/Scan.html Configure it with a start and end row key, configure setCaching to the number of rows you need and it will do a single RPC to fetch everything very efficiently. The exact response time depends on your hardware, caching, and data size. J-D On Tue, May 4, 2010 at 3:16 PM, atreju n.atr...@gmail.com wrote: Hello, I am doing a research on HBase if we can use it efficiently in our company. I need to be able get/scan list of rows for an array of key values (sorted, long type). The array size will be 1,000 to 10,000. The table will have a few hundred million rows. What is the most efficient (fastest) way to get the list of rows for the requested row key values? Thanks.
Re: hbase.client.retries.number = 1 is bad
Trunk is a work in progress and the shell was recently redone. This configuration was set tentatively by the author of that change but, as you can see, it doesn't work very well! The jira is here https://issues.apache.org/jira/browse/HBASE-2352 J-D On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz mkur...@gmail.com wrote: Hi! I'm using a fresh version of trunk. I'm experiencing a problem where the invalid region locations are not removed from the cache of HCM. I'm only using scanners on the table and I receive the following errors: 2010-05-03 23:42:52,574 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing internal scanner to startKey at 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg' 2010-05-03 23:42:52,574 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for row http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg in tableName Test5: location server 10.1.3.111:60020, location region name Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136 SEVERE: Trying to contact region server 10.1.3.111:60020 for region Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136, row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg', but failed after 1 attempts. Exceptions: java.net.ConnectException: Connection refused Which is expected as the 10.1.3.111:60020 regionserver was offline for hours at that time. The cause of this problem is that I set hbase.client.retries.number to 1 as I don't like the current retry options. In this case the following code at HConnectionManager.java:1061 callable.instantiateServer(tries != 0); will make scanners to always use the cache. This makes hbase.client.retries.number = 1 an unusable option. This is not intentional, am I correct? Am I forced to use the retries, or is there an other option? Also I would like to ask, when is it a good thing to retry an operation? In my experience there exists two kinds of failures 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline This can be due to a compaction, in which case we probably need to wait for a few seconds. Or it can be due to a split, in which case we might need to wait for minutes. Either case I would not want my client to wait for such long times when I could reschedule other things to do in that time. It is also possible that region has been transfered to an other regionserver but that is rare compared to the other cases. 2) java.net.ConnectException : regionserver is offline This is solved as soon as the master can reopen regions on an other regionserver, but still can take minutes. Anyway this exception is also rare(usually) Best regards, Miklos
Re: hbase.client.retries.number = 1 is bad
Yeah I understand that retries are unusable at that level, but you still want retries in order to be able to recalibrate the .META. cache right? So the semantic here is that 1 retry is in fact 1 try, using the cached information. https://issues.apache.org/jira/browse/HBASE-2445 is about reviewing those semantics in order to offer something more tangible to the users rather than a mix of number of retries and timeouts. Feel free to take a look and even a stab at this issue ;) J-D On Mon, May 3, 2010 at 3:25 PM, Miklós Kurucz mkur...@gmail.com wrote: This problem is not related to the shell. I checked 0.20.3 has the same code HConnectionManager.java:1034, I expect that to be broken too. Miklos 2010/5/4 Jean-Daniel Cryans jdcry...@apache.org: Trunk is a work in progress and the shell was recently redone. This configuration was set tentatively by the author of that change but, as you can see, it doesn't work very well! The jira is here https://issues.apache.org/jira/browse/HBASE-2352 J-D On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz mkur...@gmail.com wrote: Hi! I'm using a fresh version of trunk. I'm experiencing a problem where the invalid region locations are not removed from the cache of HCM. I'm only using scanners on the table and I receive the following errors: 2010-05-03 23:42:52,574 DEBUG org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing internal scanner to startKey at 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg' 2010-05-03 23:42:52,574 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit for row http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg in tableName Test5: location server 10.1.3.111:60020, location region name Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136 SEVERE: Trying to contact region server 10.1.3.111:60020 for region Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136, row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg', but failed after 1 attempts. Exceptions: java.net.ConnectException: Connection refused Which is expected as the 10.1.3.111:60020 regionserver was offline for hours at that time. The cause of this problem is that I set hbase.client.retries.number to 1 as I don't like the current retry options. In this case the following code at HConnectionManager.java:1061 callable.instantiateServer(tries != 0); will make scanners to always use the cache. This makes hbase.client.retries.number = 1 an unusable option. This is not intentional, am I correct? Am I forced to use the retries, or is there an other option? Also I would like to ask, when is it a good thing to retry an operation? In my experience there exists two kinds of failures 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline This can be due to a compaction, in which case we probably need to wait for a few seconds. Or it can be due to a split, in which case we might need to wait for minutes. Either case I would not want my client to wait for such long times when I could reschedule other things to do in that time. It is also possible that region has been transfered to an other regionserver but that is rare compared to the other cases. 2) java.net.ConnectException : regionserver is offline This is solved as soon as the master can reopen regions on an other regionserver, but still can take minutes. Anyway this exception is also rare(usually) Best regards, Miklos
Re: Hbase: GETs are very slow
Which version? How much heap was given to HBase? WRT block caching, I don't see how it could impact uploading in any way, you should enable it. What was the problem inserting 1B rows exactly? How were you running the upload? Are you making sure there's no swap on the machines? That kills java performance faster than you can say hbase ;) J-D On Fri, Apr 30, 2010 at 8:36 AM, Ruben Quintero rfq_...@yahoo.com wrote: Hi, I have a hadoop/hbase cluster running on 9 machines (only 8 GB RAM, 1 TB drives), and have recently noticed that Gets from Hbase have slowed down significantly. I'd say at this point I'm not getting more than 100/sec when using the Hbase Java API. DFS-wise, there's plenty of space left (using less than 10%), and all of the servers seem okay. The tables use LZO, and have blockcache disabled (we were having problems inserting up to a billion rows with it on, and read in the mailing list somewhere that it might help). The primary table has only 4 million rows at the moment. I created a new test table with only 200,000, and it was running 100/sec as well. I'm not sure what the problem could be (paging?), or some configuration that can be adjusted? Any ideas? I can show our configuration if that's helpful, I just wasn't sure what info would be helpful and what would be extraneous. Thanks, - Ruben
Re: EC2 + Thrift inserts
Yeah more handlers won't do it here since there's tons of calls waiting on a single synchronized method, I guess the IndexedRegion should use a pool of HTables instead of a single one in order to improve indexation throughput. J-D On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas c...@email.com wrote: Here is the thread dump: I cranked up the handlers to 300 just in case and ran 40 mappers that loaded data via thrift. Each node runs its own thrift server. I saw an average of 18 rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems no matter how many mappers I throw the total number of rows/sec doesn't go much above 700 rows/second total, which seems very, very slow to me. Here is the thread dump from a node: http://pastebin.com/U3eLRdMV I do see quite a bit of waiting and some blocking in there, not sure how exactly to interpret it all though. thanks for any help! -chris On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote: One thing to check is at the peak of your load, run jstack on one of the regionservers, and look at the handler threads - if all of them are doing something you might be running into handler contention. it is basically ultimately IO bound. -ryan On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas c...@email.com wrote: They are all at 100, but none of the regionservers are loaded - most are less than 20% CPU. Is this all network latency? -chris On Apr 29, 2010, at 8:29 PM, Ryan Rawson ryano...@gmail.com wrote: Every insert on an indexed would require at the very least an RPC to a different regionserver. If the regionservers are busy, your request could wait in the queue for a moment. One param to tune would be the handler thread count. Set it to 100 at least. On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas c...@email.com wrote: I just finished some testing with JDK 1.6 u17 - so far no performance improvements with just changing that. Disabling LZO compression did gain a little bit (up to about 30/sec from 25/sec per thread). Turning of indexes helped the most - that brought me up to 115/sec @ 2875 total rows a second. A single perl/thrift process can load at over 350 rows/sec so its not scaling as well as I would have expected, even without the indexes. Are the transactional indexes that costly? What is the bottleneck there? CPU utilization and network packets went up when I disabled the indexes, I don't think those are the bottlenecks for the indexes. I was even able to add another 15 insert process (total of 40) and only lost about 10% on a per process throughput. I probably could go even higher, none of the nodes are above CPU 60% utilization and IO wait was at most 3.5%. Each rowkey is unique, so there should not be any blocking on the row locks. I'll do more indexed tests tomorrow. thanks, -chris On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote: Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 and you should be good to go. _18 is a botched release if I ever saw one. -Todd On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas c...@email.com wrote: Hi Stack, Thanks for looking. I checked the ganglia charts, no server was at more than ~20% CPU utilization at any time during the load test and swap was never used. Network traffic was light - just running a count through hbase shell generates a much higher use. One the server hosting meta specifically, it was at about 15-20% CPU, and IO wait never went above 3%, was usually down at near 0. The load also died with a thrift timeout on every single node (each node connecting to localhost for its thrift server), it looks like a datanode just died and caused every thrift connection to timeout - I'll have to up that limit to handle a node death. Checking logs this appears in the logs of the region server hosting meta, looks like the dead datanode causing this error: 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_508630839844593817_11180java.io.IOException: Bad response 1 for block blk_508630839844593817_11180 from datanode 10.195.150.255:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423) The regionserver log on teh dead node, 10.195.150.255 has some more errors in it: http://pastebin.com/EFH9jz0w I found this in the .out file on the datanode: # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x62263c] # # An error report file with more information is saved as: # /usr/local/hadoop-0.20.1/hs_err_pid1364.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # There is not a single error in the datanode's log though. Also of note - this happened well into the test, so the node dying cause the load to abort but not the prior
Re: EC2 + Thrift inserts
The contrib packages doesn't get as much love as core HBase, so they tend to be under performant and/or reliable and/or maintained and/or etc. In this case the issue doesn't seem that bad since it could just use a HTablePool, but using IndexedTables will definitely be slower than straight insert since it writes to 2 tables (the main table and the index). J-D On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas c...@email.com wrote: It appears that for multiple simulations loads using the IndexTables probably not the best choice? -chris On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote: Yeah more handlers won't do it here since there's tons of calls waiting on a single synchronized method, I guess the IndexedRegion should use a pool of HTables instead of a single one in order to improve indexation throughput. J-D On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas c...@email.com wrote: Here is the thread dump: I cranked up the handlers to 300 just in case and ran 40 mappers that loaded data via thrift. Each node runs its own thrift server. I saw an average of 18 rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems no matter how many mappers I throw the total number of rows/sec doesn't go much above 700 rows/second total, which seems very, very slow to me. Here is the thread dump from a node: http://pastebin.com/U3eLRdMV I do see quite a bit of waiting and some blocking in there, not sure how exactly to interpret it all though. thanks for any help! -chris On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote: One thing to check is at the peak of your load, run jstack on one of the regionservers, and look at the handler threads - if all of them are doing something you might be running into handler contention. it is basically ultimately IO bound. -ryan On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas c...@email.com wrote: They are all at 100, but none of the regionservers are loaded - most are less than 20% CPU. Is this all network latency? -chris On Apr 29, 2010, at 8:29 PM, Ryan Rawson ryano...@gmail.com wrote: Every insert on an indexed would require at the very least an RPC to a different regionserver. If the regionservers are busy, your request could wait in the queue for a moment. One param to tune would be the handler thread count. Set it to 100 at least. On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas c...@email.com wrote: I just finished some testing with JDK 1.6 u17 - so far no performance improvements with just changing that. Disabling LZO compression did gain a little bit (up to about 30/sec from 25/sec per thread). Turning of indexes helped the most - that brought me up to 115/sec @ 2875 total rows a second. A single perl/thrift process can load at over 350 rows/sec so its not scaling as well as I would have expected, even without the indexes. Are the transactional indexes that costly? What is the bottleneck there? CPU utilization and network packets went up when I disabled the indexes, I don't think those are the bottlenecks for the indexes. I was even able to add another 15 insert process (total of 40) and only lost about 10% on a per process throughput. I probably could go even higher, none of the nodes are above CPU 60% utilization and IO wait was at most 3.5%. Each rowkey is unique, so there should not be any blocking on the row locks. I'll do more indexed tests tomorrow. thanks, -chris On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote: Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 and you should be good to go. _18 is a botched release if I ever saw one. -Todd On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas c...@email.com wrote: Hi Stack, Thanks for looking. I checked the ganglia charts, no server was at more than ~20% CPU utilization at any time during the load test and swap was never used. Network traffic was light - just running a count through hbase shell generates a much higher use. One the server hosting meta specifically, it was at about 15-20% CPU, and IO wait never went above 3%, was usually down at near 0. The load also died with a thrift timeout on every single node (each node connecting to localhost for its thrift server), it looks like a datanode just died and caused every thrift connection to timeout - I'll have to up that limit to handle a node death. Checking logs this appears in the logs of the region server hosting meta, looks like the dead datanode causing this error: 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_508630839844593817_11180java.io.IOException: Bad response 1 for block blk_508630839844593817_11180 from datanode 10.195.150.255:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423) The regionserver log on teh dead node, 10.195.150.255 has some more
Re: EC2 + Thrift inserts
On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas c...@email.com wrote: Thank you, it is nice to get this help. I definitely understand the overhead of writing the index, although it seems much worse than just that overhead would indicate. If I understand you correctly that is because all inserts into an IndexedTable are synchronized on one table? If that was switched to using an HTablePool it would no longer be as sever a bottleneck (performance is about an order of magnitude better without the indexing)? They are synchronized per region server yes, and it _should_ be better with a pool since then you can do parallel inserts. Patching it doesn't seem hard, but maybe I'm missing some finer details since I usually don't work around that code. I'm also using thrift to connect and am wondering if that itself puts an overall limit on scaling? It does seem that no matter how many more mappers and servers I add, even without indexing, I am capped at about 5k rows/sec total. I'm waiting a bit as the table grows so that it is split across more regionservers, hopefully that will help, but as far as I can tell I am not hitting any CPU or IO constraint during my tests. I don't understand the I'm also using thrift and how many more mappers part, you are using Thrift inside a map? Anyways, more clients won't help since there's a single mega serialization of all the inserts to the index table per region server. It's normal not to see any CPU/mem/IO contention since, in this case, it's all about the speed at which you can process a single row insertion The rest of the threads just wait... -chris I'm also using thrift, and while I am using the On Apr 30, 2010, at 3:00 PM, Jean-Daniel Cryans wrote: The contrib packages doesn't get as much love as core HBase, so they tend to be under performant and/or reliable and/or maintained and/or etc. In this case the issue doesn't seem that bad since it could just use a HTablePool, but using IndexedTables will definitely be slower than straight insert since it writes to 2 tables (the main table and the index). J-D On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas c...@email.com wrote: It appears that for multiple simulations loads using the IndexTables probably not the best choice? -chris On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote: Yeah more handlers won't do it here since there's tons of calls waiting on a single synchronized method, I guess the IndexedRegion should use a pool of HTables instead of a single one in order to improve indexation throughput. J-D On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas c...@email.com wrote: Here is the thread dump: I cranked up the handlers to 300 just in case and ran 40 mappers that loaded data via thrift. Each node runs its own thrift server. I saw an average of 18 rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems no matter how many mappers I throw the total number of rows/sec doesn't go much above 700 rows/second total, which seems very, very slow to me. Here is the thread dump from a node: http://pastebin.com/U3eLRdMV I do see quite a bit of waiting and some blocking in there, not sure how exactly to interpret it all though. thanks for any help! -chris On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote: One thing to check is at the peak of your load, run jstack on one of the regionservers, and look at the handler threads - if all of them are doing something you might be running into handler contention. it is basically ultimately IO bound. -ryan On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas c...@email.com wrote: They are all at 100, but none of the regionservers are loaded - most are less than 20% CPU. Is this all network latency? -chris On Apr 29, 2010, at 8:29 PM, Ryan Rawson ryano...@gmail.com wrote: Every insert on an indexed would require at the very least an RPC to a different regionserver. If the regionservers are busy, your request could wait in the queue for a moment. One param to tune would be the handler thread count. Set it to 100 at least. On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas c...@email.com wrote: I just finished some testing with JDK 1.6 u17 - so far no performance improvements with just changing that. Disabling LZO compression did gain a little bit (up to about 30/sec from 25/sec per thread). Turning of indexes helped the most - that brought me up to 115/sec @ 2875 total rows a second. A single perl/thrift process can load at over 350 rows/sec so its not scaling as well as I would have expected, even without the indexes. Are the transactional indexes that costly? What is the bottleneck there? CPU utilization and network packets went up when I disabled the indexes, I don't think those are the bottlenecks for the indexes. I was even able to add another 15 insert process (total of 40) and only lost about 10% on a per process throughput. I probably could go even
Re: Hbase: GETs are very slow
So we chatted a bit on IRC, the reason GETs were slower is because block caching was disabled and all calls were hitting HDFS. I was confused by the first email as it seemed that for some time it was still speedy without caching. I wanted to look at the import issue, but logs weren't available. J-D On Fri, Apr 30, 2010 at 10:44 AM, Ruben Quintero rfq_...@yahoo.com wrote: We're running 20.3, and it has a 6 GB heap. With block caching on, it seems we were running out of memory. It would temporarily lose a region server (usually when it attempted to split) and that caused a chain reaction when it attempted to recover. The heap would start to surge and cause a heavy garbage collection. We would have nodes dropping in and out, and getting overloaded when they rejoined. We found a post in a mailing list that recommended turning off block caching, and it ran well after that. As for swap, that was my first guess. How can I make sure it's not swapping, or is there a way to see if it is? Thanks, - Ruben From: Jean-Daniel Cryans jdcry...@apache.org To: hbase-user@hadoop.apache.org Sent: Fri, April 30, 2010 12:27:37 PM Subject: Re: Hbase: GETs are very slow Which version? How much heap was given to HBase? WRT block caching, I don't see how it could impact uploading in any way, you should enable it. What was the problem inserting 1B rows exactly? How were you running the upload? Are you making sure there's no swap on the machines? That kills java performance faster than you can say hbase ;) J-D On Fri, Apr 30, 2010 at 8:36 AM, Ruben Quintero rfq_...@yahoo.com wrote: Hi, I have a hadoop/hbase cluster running on 9 machines (only 8 GB RAM, 1 TB drives), and have recently noticed that Gets from Hbase have slowed down significantly. I'd say at this point I'm not getting more than 100/sec when using the Hbase Java API. DFS-wise, there's plenty of space left (using less than 10%), and all of the servers seem okay. The tables use LZO, and have blockcache disabled (we were having problems inserting up to a billion rows with it on, and read in the mailing list somewhere that it might help). The primary table has only 4 million rows at the moment. I created a new test table with only 200,000, and it was running 100/sec as well. I'm not sure what the problem could be (paging?), or some configuration that can be adjusted? Any ideas? I can show our configuration if that's helpful, I just wasn't sure what info would be helpful and what would be extraneous. Thanks, - Ruben
Re: EC2 + Thrift inserts
Not sure why you are going through thrift if you are already using java (you want to test thrift's speed because java isn't your main dev language?) but it will maybe add 1ms or 2, really not that bad. Here at StumbleUpon we use thrift to get our php website to talk to HBase and on average we stay under 10ms for random gets. Our machines are 2xi7, 24GB, 4x1TB sata. My coworker (Stack) pinged the author of the contrib to see if he can make a patch for your issue. J-D On Fri, Apr 30, 2010 at 4:51 PM, Chris Tarnas c...@email.com wrote: On Apr 30, 2010, at 4:44 PM, Jean-Daniel Cryans wrote: On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas c...@email.com wrote: I'm also using thrift to connect and am wondering if that itself puts an overall limit on scaling? It does seem that no matter how many more mappers and servers I add, even without indexing, I am capped at about 5k rows/sec total. I'm waiting a bit as the table grows so that it is split across more regionservers, hopefully that will help, but as far as I can tell I am not hitting any CPU or IO constraint during my tests. I don't understand the I'm also using thrift and how many more mappers part, you are using Thrift inside a map? Anyways, more clients won't help since there's a single mega serialization of all the inserts to the index table per region server. It's normal not to see any CPU/mem/IO contention since, in this case, it's all about the speed at which you can process a single row insertion The rest of the threads just wait... Sorry - should have been more clear. I'm testing now with a normal tables and regionservers and I seem to cap out at about 5-7k rows a second for inserts. My method for doing inserts is to use map reduce on hadoop to launch many insert processes, each process uses the local thrift server on each node to connect to hbase. In this case I hope that other threads can insert at the same time. -chris
Re: Hbase Hive
Inline (and added hbase-user to the recipients). J-D On Thu, Apr 29, 2010 at 9:23 PM, Amit Kumar amkumar@gmail.com wrote: Hi Everyone, I want to ask about Hbase and Hive. Q1 Is there any dialect available which can be used with Hibernate to create persistence with Hbase. Has somebody written one. I came across HBql at www.hbql.com. Can this be used to create a dialect for Hbase? HBQL queries HBase directly, but it's not SQL-compliant and doesn't feature relational keywords (since HBase doesn't support them, JOINs don't scale). I don't know if anybody tried integrating HBQL in Hibernate... it's still a very young project. Q2 Once the data is in there in Hbase. In this link I found that it can be used with Hive ( https://issues.apache.org/jira/browse/HIVE-705 ). So the question is is it safe enough to use the below architecture for application Hibernate -- Dialect for Hbase -- Hbase -- query from Hbase using Hive to use MapReduce effectively. Hive goes on top of HBase, so you can use its query language to mine HBase tables. Be aware that a MapReduce job isn't meant for live queries, so issuing them from Hibernate doesn't make much sense... unless you meant something else and this which case please do give more details. Thanks Regards Amit Kumar
Re: data node stops on slave
Looks like your nodes share the same storage (NFS share or SAN?), and only one DN can serve it (else it would be unmanageable). J-D On Mon, Apr 26, 2010 at 3:03 AM, Muhammad Mudassar mudassa...@gmail.com wrote: I have posted the problem in common-user but no one replied so now sending here to get some help on the issue. -- Forwarded message -- From: Muhammad Mudassar mudassa...@gmail.com Date: Fri, Apr 23, 2010 at 4:59 PM Subject: data node stops on slave To: common-u...@hadoop.apache.org Hi I am following tutorial running hadoop on ubuntu linux (multinode cluster) * http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) *http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 for configuring 2 node cluster but i am facing problem data node on slave machine goes down after some time here I am sending log file of datanode on slave machine and log file of namenode at master machine kindly help me to solve the issue. *Log file of data node on slave machine* 2010-04-23 17:37:17,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = hadoop-desktop/127.0.1.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2010-04-23 17:37:19,115 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.3.31.221:54310. Already tried 0 time(s). 2010-04-23 17:37:25,303 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean 2010-04-23 17:37:25,305 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010 2010-04-23 17:37:25,307 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s 2010-04-23 17:37:30,777 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075 2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075 2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075 2010-04-23 17:37:30,833 INFO org.mortbay.log: jetty-6.1.14 2010-04-23 17:37:31,242 INFO org.mortbay.log: Started selectchannelconnec...@0.0.0.0:50075 2010-04-23 17:37:31,279 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2010-04-23 17:37:36,608 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting 2010-04-23 17:37:36,611 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting 2010-04-23 17:37:36,611 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(hadoop-desktop:50010, storageID=DS-463609775-127.0.1.1-50010-1271833984369, infoPort=50075, ipcPort=50020) 2010-04-23 17:37:36,639 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.3.31.220:50010, storageID=DS-463609775-127.0.1.1-50010-1271833984369, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/home/hadoop/Desktop/dfs/datahadoop/dfs/data/current'} 2010-04-23 17:37:36,639 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 360msec Initial delay: 0msec 2010-04-23 17:37:36,653 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 17 blocks got processed in 6 msecs 2010-04-23 17:37:36,665 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner. 2010-04-23 17:37:39,641 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action: DNA_REGISTER 2010-04-23 17:37:42,645 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 10.3.31.220:50010 is attempting to report storage ID DS-463609775-127.0.1.1-50010-1271833984369. Node 10.3.31.221:50010 is expected to serve this storage. at
Re: Get operation in HBase Map-Reduce methods
What are the numbers like? Is it 1k rows you need to process? 1M? 10B? Your question is more about scaling (or the need to). J-D On Tue, Apr 20, 2010 at 8:39 AM, Andrey atimerb...@gmx.net wrote: Dear All, Assumed, I've got a list of rowIDs of a HBase table. I want to get each row by its rowID, do some operations with its values, and store the results somewhere subsequently. Is there a good way to do this in a Map-Reduce manner? As far as I understand, a mapper usually takes a Scan to form inputs. It is quite possible to create such a Scan, which contains a lot of RowFilters to be EQUAL to a particular rowId. Such a strategy will work for sure, however is inefficient, since each filter will be tried to match to each found row. So, is there a good Map-Reduce praxis for such kind of situations? (E.g. to make a Get operation inside a map() method.) If yes, could you kindly point to a good code example? Thank you in advance.
Re: Get operation in HBase Map-Reduce methods
That can be done in a couple of seconds using the normal HBase client in a multithreaded process, fed by a message queue if you feel like it. What were you trying to achieve using MR? J-D On Tue, Apr 20, 2010 at 12:54 PM, Andrey atimerb...@gmx.net wrote: Yes, about 1k rows currently. In the future it may happen to be more: some tens of thousands. Andrey
Re: About the Log entries?
You are reading it wrong. The second line you pasted shows how many edits where applied to region test17,,1271654370789 without telling you from which log it was coming from. Your log has edits from all regions, including META and ROOT if present on that RS. But, do expect data loss on un-rolled logs since that version of HDFS doesn't support fsync. J-D On Mon, Apr 19, 2010 at 8:54 AM, ChingShen chingshenc...@gmail.com wrote: Hi, I wrote a sequential put example(300,000 rows, the memstore will not reach 64MB) to check how does the HLog work. 2010-04-19 13:51:25,340 INFO org.apache.hadoop.hbase.regionserver.HLog: * Roll* /hbase/.logs/52-0980216-01,48562,1271656125926/hlog.dat.1271656125952, entries=*29*, calcsize=63753517, filesize=32619925. New hlog /hbase/.logs/52-0980216-01,48562,1271656125926/hlog.dat.1271656285337 After I enter the kill -9 master_pid command and restart hbase: 2010-04-19 13:53:57,578 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://localhost/hbase/test17/955259787/content/3876923764760772557, entries=*291065*, sequenceid=32230123, memsize=48.9m, filesize=15.0m to test17,,1271654370789 But why can I only get *291065* rather than *29* rows? data loss? Thanks. Shen
Re: Performance Evaluation randomRead failures after 20% of execution
Not sure where to start, there are so many things wrong with your cluster. ;) Commodity hardware is usually more than 1 cpu, and HBase itself requires 1GB of RAM. Looking at slave2 for example, your datanode, region server and MR processes are all competing for 512MB of RAM and 1 CPU. In the log lines you pasted, the more important stuff is: 2010-04-17 19:11:20,864 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 15430ms, ten times longer than scheduled: 1000 That means the JVM was pausing (because of GC, or swapping, or most probably both) and becomes unresponsive. If you really wish to run processing on that cluster, I would use the master and slave1 as datanode and region servers then slave2 as MapReduce only. Also slave1 should have the Namenode, HBase Master and Zookeeper since it has more RAM. Then I would configure the heaps so that I wouldn't swap, and configure only 1 map and 1 reduce (not the default of 2). But still, I wouldn't expect much processing juice out of that. J-D On Sat, Apr 17, 2010 at 8:13 PM, jayavelu jaisenthilkumar joysent...@gmail.com wrote: Hi guys, I successfully configured hadoop, mapreduce and hbase. Now want to run Performance Evaluation a bit. The configuration of our systems are Master Machine: Processor: Intel Centrino Mobile Technology Processor 1.66 GHz CPUs Memory: 1 GB/Go DDR2 SDRAM Storage: 80 GB/Go Network: Gigabit Ethernet Slave 1 Machine: Processor: Core 2 Duo Intel T5450 Processor 1.66 GHz CPUs Memory: 2 GB/Go DDR2 SDRAM Storage: 200 GB/Go Network: Gigabit Ethernet Slave 2 Machine: Processor: Intel(R) Pentium(R) M processor 1400MHZ Memory: 512 MB RAM Storage: 45 GB Network: Gigabit Ethernet The Performance Evaluation algorithms sequentialWrite and sequentialRead are successfully runned. We followed the same procedure for randomWrite and randomRead. randomWrite was successful but randomRead was failed . See the output below for the randomRead. ( The cpu memory usage was 94% is it the reason??) had...@hadoopserver:~/hadoop-0.20.1/bin ./hadoop org.apache.hadoop.hbase.PerformanceEvaluation randomRead 3 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51 GMT 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:host.name=Hadoopserver 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_15 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_15/jre 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-0.20.1/bin/../conf:/usr/java/jdk1.6.0_15/lib/tools.jar:/home/hadoop/hadoop-0.20.1/bin/..:/home/hadoop/hadoop-0.20.1/bin/../hadoop-0.20.1-core.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-codec-1.3.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-logging-1.0.4.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-net-1.4.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jetty-6.1.14.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jetty-util-6.1.14.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/junit-3.8.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/home/hadoop/hbase-0.20.3/hbase-0.20.3.jar:/home/hadoop/hbase-0.20.3/conf:/home/hadoop/hbase-0.20.3/hbase-0.20.3-test.jar:/home/hadoop/hbase-0.20.3/lib/zookeeper-3.2.2.jar 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-0.20.1/bin/../lib/native/Linux-i386-32 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:java.compiler=NA 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
Re: hitting xceiverCount limit (2047)
Sujee, How many regions do you have and how many families per region? Looks like your datanodes have to keep a lot of xcievers opened. J-D On Tue, Apr 13, 2010 at 9:03 PM, Sujee Maniyam su...@sujee.net wrote: Thanks Stack. Do I also need to tweak timeouts? right now they are at default values for both hadoop / hbase http://sujee.net On Tue, Apr 13, 2010 at 11:40 AM, Stack st...@duboce.net wrote: Looks like you'll have to up your xceivers or up the count of hdfs nodes. St.Ack On Tue, Apr 13, 2010 at 11:37 AM, Sujee Maniyam su...@sujee.net wrote: Hi all, I have been importing a bunch of data into my hbase cluster, and I see the following error: Hbase error : hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink A.B.C.D Hadoop data node error: DataXceiver : java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047 I have configured dfs.datanode.max.xcievers = 2047 in hadoop/conf/hdfs-site.xml Config: amazon ec2 c1.xlarge instances (8 CPU, 8G RAM) 1 master + 4 region servers hbase heap size = 3G Upping the xcievers count, is an option. I want to make sure if I need to tweak any other parameters to match this. thanks Sujee http://sujee.net
Re: hitting xceiverCount limit (2047)
Exactly what you think, since all the xcievers are full then HBase cannot write to HDFS so the files cannot be persisted. This usually ends up shutting the RS since we don't want to mess things up even more. Then the master does a log replay to recover edits that were in the memstore. 7k regions is too much for that cluster. Every region has at least one file for the .regioninfo plus a bunch of other for the store files of the column family (at least 1 file). There's one xceiver per block being served (a block is 64MB) so with only 8k xceivers you simply cannot support that many regions. Is your table LZOed? If not, do consider it! J-D On Tue, Apr 13, 2010 at 10:42 PM, Sujee Maniyam su...@sujee.net wrote: J-D, - about 7000 regions (spread over 4 region servers). - one column family. - each row is about 1kbytes - 400M rows when the xciever limit is hit, I see the following errors on master log INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.210.X.Y:50010 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_3157562535002015020_4324755 what exactly does 'abandoning block' mean? thanks Sujee http://sujee.net On Tue, Apr 13, 2010 at 12:23 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Sujee, How many regions do you have and how many families per region? Looks like your datanodes have to keep a lot of xcievers opened. J-D On Tue, Apr 13, 2010 at 9:03 PM, Sujee Maniyam su...@sujee.net wrote: Thanks Stack. Do I also need to tweak timeouts? right now they are at default values for both hadoop / hbase http://sujee.net On Tue, Apr 13, 2010 at 11:40 AM, Stack st...@duboce.net wrote: Looks like you'll have to up your xceivers or up the count of hdfs nodes. St.Ack On Tue, Apr 13, 2010 at 11:37 AM, Sujee Maniyam su...@sujee.net wrote: Hi all, I have been importing a bunch of data into my hbase cluster, and I see the following error: Hbase error : hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink A.B.C.D Hadoop data node error: DataXceiver : java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047 I have configured dfs.datanode.max.xcievers = 2047 in hadoop/conf/hdfs-site.xml Config: amazon ec2 c1.xlarge instances (8 CPU, 8G RAM) 1 master + 4 region servers hbase heap size = 3G Upping the xcievers count, is an option. I want to make sure if I need to tweak any other parameters to match this. thanks Sujee http://sujee.net
Re: Why does throw java.io.IOException when I run a job?
Did you restart Hadoop after changing the configs? If you get the error it means that it wasn't picked up so there's not that many things to check (checks that only you can do). J-D On Mon, Apr 12, 2010 at 4:28 AM, 无名氏 sitong1...@gmail.com wrote: hi I received an IOException when I run a job ... java.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 256 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) at java.lang.Thread.run(Thread.java:619) But I have configured dfs.datanode.max.xcievers to 4096. *core-site.xml* ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/home/${user.name}/tmp/hadoop/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://search9b.cm3:9000/value /property property *namedfs.datanode.max.xcievers/name values4096/values* /property property namefs.inmemory.size.mb/name values200/values /property property nameio.sort.factor/name values100/values /property property nameio.sort.mb/name values200/values /property property nameio.file.buffer.size/name values131072/values /property property namemapred.job.tracker.handler.count/name values60/values /property property namemapred.reduce.parallel.copies/name values50/values /property property nametasktracker.http.threads/name values50/values /property property namemapred.child.java.opts/name values-Xmx1024M/values /property /configuration *hdfs-site.xml* ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.data.dir/name valuedfs/data/value /property property namedfs.name.dir/name valuedfs/name/value /property property *namedfs.datanode.max.xcievers/name values4096/values* /property property namedfs.namenode.handler.count/name values40/values /property property namedfs.datanode.handler.coun/name values9/values /property /configuration thks.
Re: set number of map tasks for HBase MR
A map against a HBase table by default cannot have more tasks than the number of regions in that table. Also you want to enable scanner caching. Pass a Scan object to the TableMapReduceUtil.initTableMapperJob that is configured with scan.setCaching(some_value) where the value should be the number of rows to fetch every time we hit a region server with next(). On rows of 100-200 bytes, our jobs usually are configured with 1000 up to 1. Finally, is your job running in local mode or on a job tracker? Even if HBase uses HDFS, it usually doesn't know of the job tracker unless you configure HBase's classpath with Hadoop's conf. J-D On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko cryp...@mail.saturnfans.com wrote: Hi, thanks for quick response. I tried to do following in the code: job.getConfiguration().setInt(mapred.map.tasks, 1); but unfortunately have the same result. Any other ideas? --- ama...@gmail.com wrote: From: Amandeep Khurana ama...@gmail.com To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com Subject: Re: set number of map tasks for HBase MR Date: Sat, 10 Apr 2010 20:04:18 -0700 You can set the number of map tasks in your job config to a big number (eg: 10), and the library will automatically spawn one map task per region. -ak Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko cryp...@mail.saturnfans.com wrote: Hi guys, I have about 8G Hbase table and I want to run MR job against it. It works extremely slow in my case. One thing I noticed is that job runs only 2 map tasks. Is it any way to setup bigger number of map tasks? I sow some method in mapred package, but can't find anything like this in new mapreduce package. I run my MR job one a single machine in cluster mode. _ Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/ _ Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/
Re: Region not getting served
Exactly which version of hbase are you using? According to my digging of HStoreKey's SVN history to match the row numbers, you seem to be on the 0.19 branch. Any reason you are using something that's a year old compared to 0.20.3 which was released last January? BTW your splitting isn't wrong, the region server is trying to parse the column family and there's something null where it shouldn't be. J-D On Sun, Apr 11, 2010 at 10:55 AM, john smith js1987.sm...@gmail.com wrote: Hi all, I wrote my own getSplits() function for HBase-MR . A is a table involved in MR . I am getting the following stack trace. It seems that it couldn't access the region. But my region server is up and running. Does it indicate that my splitting is wrong? http://pastebin.com/YBK4JQBu Thanks j.S
Re: set number of map tasks for HBase MR
Yes an option could be added, along with a write buffer option for Import. J-D On Sun, Apr 11, 2010 at 3:30 PM, Ted Yu yuzhih...@gmail.com wrote: I noticed mapreduce.Export.createSubmittableJob() doesn't call setCaching() in 0.20.3 Should call to setCaching() be added ? Thanks On Sun, Apr 11, 2010 at 2:14 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: A map against a HBase table by default cannot have more tasks than the number of regions in that table. Also you want to enable scanner caching. Pass a Scan object to the TableMapReduceUtil.initTableMapperJob that is configured with scan.setCaching(some_value) where the value should be the number of rows to fetch every time we hit a region server with next(). On rows of 100-200 bytes, our jobs usually are configured with 1000 up to 1. Finally, is your job running in local mode or on a job tracker? Even if HBase uses HDFS, it usually doesn't know of the job tracker unless you configure HBase's classpath with Hadoop's conf. J-D On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko cryp...@mail.saturnfans.com wrote: Hi, thanks for quick response. I tried to do following in the code: job.getConfiguration().setInt(mapred.map.tasks, 1); but unfortunately have the same result. Any other ideas? --- ama...@gmail.com wrote: From: Amandeep Khurana ama...@gmail.com To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com Subject: Re: set number of map tasks for HBase MR Date: Sat, 10 Apr 2010 20:04:18 -0700 You can set the number of map tasks in your job config to a big number (eg: 10), and the library will automatically spawn one map task per region. -ak Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko cryp...@mail.saturnfans.com wrote: Hi guys, I have about 8G Hbase table and I want to run MR job against it. It works extremely slow in my case. One thing I noticed is that job runs only 2 map tasks. Is it any way to setup bigger number of map tasks? I sow some method in mapred package, but can't find anything like this in new mapreduce package. I run my MR job one a single machine in cluster mode. _ Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/ _ Sign up for your free SaturnFans email account at http://webmail.saturnfans.com/
Re: Region not getting served
WRT the original problem, I only see the result and not the code or anything else. Help me help you. (but it's probably better in 0.20, hence why I suggest upgrading) Text implements WritableComparable, so it's not your problem. TextArrayWritable is not in the 0.20 branch IIRC, that should be the problem. J-D On Sun, Apr 11, 2010 at 7:32 PM, john smith js1987.sm...@gmail.com wrote: J.D, I tried working with the 0.20+ branch of hadoop and Hbase. I changed my build paths in eclipse and I found out the following errors public class MyTableMap extends MapReduceBase implements TableMapText, TextArrayWritable { It is saying that the position of Text must extend WritableComparable which is true for hadoop 0.19 branch where as it is showing errors for 0.20+ branch because class Text extends BinaryComparable class. Any solution to this or to the solution to the original problem .. (as you said some problem with the parsing).. Kindly help me Thanks On Sun, Apr 11, 2010 at 6:07 PM, john smith js1987.sm...@gmail.com wrote: J.D. Thanks for replying. My hbase version is 0.19.3. Because I wrote many codes for this version, I haven't updated it. Also i'll check if therz any problem with my column family naming ..such as missing : etc and I'll let you know. Thanks On Sun, Apr 11, 2010 at 5:10 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Exactly which version of hbase are you using? According to my digging of HStoreKey's SVN history to match the row numbers, you seem to be on the 0.19 branch. Any reason you are using something that's a year old compared to 0.20.3 which was released last January? BTW your splitting isn't wrong, the region server is trying to parse the column family and there's something null where it shouldn't be. J-D On Sun, Apr 11, 2010 at 10:55 AM, john smith js1987.sm...@gmail.com wrote: Hi all, I wrote my own getSplits() function for HBase-MR . A is a table involved in MR . I am getting the following stack trace. It seems that it couldn't access the region. But my region server is up and running. Does it indicate that my splitting is wrong? http://pastebin.com/YBK4JQBu Thanks j.S
Re: HTable Client RS caching
On Wed, Apr 7, 2010 at 11:38 PM, Al Lias al.l...@gmx.de wrote: Occationally my HTable clients get a response that no server is serving a particular region... Normally, the region is back a few seconds later (perhaps a split?). Or the region moved. Anyway, the client (Using HTablePool) seems to need a restart to forget this. Seems wrong, would love a stack trace. Is there a config value to manipulate the caching time of regionserver assignments in the client? Nope, when the client sees a NSRE, it queries .META. to find the new location. I set a small value for hbase.client.pause to get failures fast. I am using 0.20.3 . Splits are still kinda slow, takes at least 2 seconds to happen, but finding the new location of a region is a core feature in HBase and it's rather well tested, Can you pin down your exact problem? Next time a NSRE happens, see which region it was looking for and grep the master log for it, you should see the history and how much time it took to move. Thx, Al
Re: Received RetriesExhaustedException when write to hbase table, or received WrongRegionException when read from hbase table.
Without knowing what happened, it's hard to propose a cure... Anyways, restarting the cluster normally take care of such situation or you can recreate all the .META. entries by running bin/add_table.rb J-D 2010/4/7 无名氏 sitong1...@gmail.com: I am anxious how to repair the region, or recreate the region for continue write. No need to recover data. thks 2010/4/8 Jean-Daniel Cryans jdcry...@apache.org I would also like to know why your region server went bad, but I'm missing a lot of information here ;) Like the version of hadoop/hbase, size of your cluster, the hardware, what/how much are you trying to insert, and definitely some master and region server logs either in a pastebin or on a web server, not directly into the email. Thx, J-D On Wed, Apr 7, 2010 at 1:33 AM, 无名氏 sitong1...@gmail.com wrote: Some region server bad, I doubt. When I write record to HBase table, throw RetriesExhaustedException: Exception in thread main org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993 for region web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993, row 'r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19', but failed after 10 attempts. Exceptions: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1120) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605) at storage.client.FeedSchema.flushCommits(FeedSchema.java:72) When I read info from HBase table. org.apache.hadoop.hbase.regionserver.WrongRegionException: org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for HRegion web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993, startKey='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1', getEndKey()='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D643994', row='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19' at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1522) at org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1554) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1622) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2285) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1788) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) I get META info through hbase shell. command: get '.META.', web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993 result : COLUMN CELL info:regioninfo timestamp=1270529567780, value=REGION = {NAME = 'web_info,r:http:\\x2F\\x2Fcom. ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D \\x26page\\x3D1,1270529565993', STARTKEY = 'r:http:\\x2F\\x2Fcom.ccidnet.linux.b bs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D\\x26page\\x3D1 ', ENDKEY = 'r:http:\\x2F\\x2Fcom.ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D643 994', ENCODED = 1771513916, TABLE = {{NAME = 'web_info', FAMILIES = [{NAME = 'article_dedup', VERSIONS = '2', COMPRESSION = 'NONE', TTL = '2147483647', BL OCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'dedup' , VERSIONS = '2', COMPRESSION = 'NONE', TTL = '2147483647', BLOCKSIZE = '6553 6', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'global', VERSIONS = ' 2', COMPRESSION = 'NONE', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'true', BLOCKCACHE = 'true'}, {NAME = 'page_type', VERSIONS = '2
Re: HTable Client RS caching
No it's there: domaincrawltable,,1270600690648 J-D On Thu, Apr 8, 2010 at 10:38 AM, Ted Yu yuzhih...@gmail.com wrote: What if there is no region information in NSRE ? 2010-04-08 10:26:38,385 ERROR [IPC Server handler 60 on 60020] regionserver.HRegionServer(846): Failed openScanner org.apache.hadoop.hbase.NotServingRegionException: domaincrawltable,,1270600690648 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1893) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) On Thu, Apr 8, 2010 at 9:39 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Wed, Apr 7, 2010 at 11:38 PM, Al Lias al.l...@gmx.de wrote: Occationally my HTable clients get a response that no server is serving a particular region... Normally, the region is back a few seconds later (perhaps a split?). Or the region moved. Anyway, the client (Using HTablePool) seems to need a restart to forget this. Seems wrong, would love a stack trace. Is there a config value to manipulate the caching time of regionserver assignments in the client? Nope, when the client sees a NSRE, it queries .META. to find the new location. I set a small value for hbase.client.pause to get failures fast. I am using 0.20.3 . Splits are still kinda slow, takes at least 2 seconds to happen, but finding the new location of a region is a core feature in HBase and it's rather well tested, Can you pin down your exact problem? Next time a NSRE happens, see which region it was looking for and grep the master log for it, you should see the history and how much time it took to move. Thx, Al
Re: Received RetriesExhaustedException when write to hbase table, or received WrongRegionException when read from hbase table.
I would also like to know why your region server went bad, but I'm missing a lot of information here ;) Like the version of hadoop/hbase, size of your cluster, the hardware, what/how much are you trying to insert, and definitely some master and region server logs either in a pastebin or on a web server, not directly into the email. Thx, J-D On Wed, Apr 7, 2010 at 1:33 AM, 无名氏 sitong1...@gmail.com wrote: Some region server bad, I doubt. When I write record to HBase table, throw RetriesExhaustedException: Exception in thread main org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1, region=web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993 for region web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993, row 'r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19', but failed after 10 attempts. Exceptions: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1120) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605) at storage.client.FeedSchema.flushCommits(FeedSchema.java:72) When I read info from HBase table. org.apache.hadoop.hbase.regionserver.WrongRegionException: org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for HRegion web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993, startKey='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1', getEndKey()='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D643994', row='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19' at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1522) at org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1554) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1622) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2285) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1788) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) I get META info through hbase shell. command: get '.META.', web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993 result : COLUMN CELL info:regioninfo timestamp=1270529567780, value=REGION = {NAME = 'web_info,r:http:\\x2F\\x2Fcom. ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D \\x26page\\x3D1,1270529565993', STARTKEY = 'r:http:\\x2F\\x2Fcom.ccidnet.linux.b bs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D\\x26page\\x3D1 ', ENDKEY = 'r:http:\\x2F\\x2Fcom.ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D643 994', ENCODED = 1771513916, TABLE = {{NAME = 'web_info', FAMILIES = [{NAME = 'article_dedup', VERSIONS = '2', COMPRESSION = 'NONE', TTL = '2147483647', BL OCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'dedup' , VERSIONS = '2', COMPRESSION = 'NONE', TTL = '2147483647', BLOCKSIZE = '6553 6', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'global', VERSIONS = ' 2', COMPRESSION = 'NONE', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'true', BLOCKCACHE = 'true'}, {NAME = 'page_type', VERSIONS = '2', COMPRESSI ON = 'NONE', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BL OCKCACHE = 'true'}, {NAME = 'parser', VERSIONS = '2', COMPRESSION = 'GZ', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true '}, {NAME = 'pid_match', VERSIONS = '2', COMPRESSION = 'NONE', TTL = '2147483 647', BLOCKSIZE = '65536', IN_MEMORY =
Re: HBase always corrupted
At StumbleUpon we have north of 20 billions rows, each of 100-200 bytes. Look in your datanode log for http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5 or that http://wiki.apache.org/hadoop/Hbase/FAQ#A6 J-D On Wed, Apr 7, 2010 at 9:55 AM, Geoff Hendrey ghend...@decarta.com wrote: Hi, I am running an HBase instance in a pseudocluster mode, on top of a pseudoclustered HDFS, on a single machine. I have a 10 node map/reduce cluster that is using a TableMapper to drive a map/reduce job. In the map phase, two Gets are executed against against HBase. The Map phase generates two orders of magnitude more data than was pumped in, and in the reduce phase we do some consolidation of the generated data, then execute a Put into HBase with autocomit=false, and the batch size set to 100,000 (I tried 1000,1 as well and found 100,000 worked best). I am using 32 reducers, and reduce seems to run 1000X slower than mapping. Unfortunately, the job consistently crashes around 85% reduce completion, with HDFS related errors from the HBase machine: java.io.IOException: java.io.IOException: All datanodes 127.0.0.1:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DF SClient.java:2525) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.j ava:2078) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli ent.java:2241) So I am clearly aware of the mismatch betweem the big mapreduce cluster, and the wimpy HBase installation, but why am I seeing consistent crashes? Shouldn't the HBase cluster just be slower, not unreliable? Here is my main question: should I expect that running a real HBase cluster will solve my problems and does anyone have experience with a map/reduce job that pumps several billion rows into HBase? -geoff
Re: HBase Client Maven Dependency in POM
Same answer I gave an hour ago to your other email: Sajeev, 0.20 isn't mavenized, the svn trunk is. J-D On Wed, Apr 7, 2010 at 10:46 AM, Sajeev Joseph sajeev.jos...@cypresscare.com wrote: I have HBase 0.20.3 up and running on my 'windows/cygwin' platform. Now, I would like to write a Java client to access the HBase server. This java client would be part of a large Enterprise Service Bus (ESB) application we currently have. We use maven as build tool with all our applications, and I would like to add an HBase client dependency in our 'POM' to pull in all the relevant jar files required by the HBase client API. After spending hours reading through HBase documentation, I couldn't find this mentioned anywhere. Am I missing something? Do you have a maven repo where I can pull in all jars required by the HBase Client? Thank you, Sajeev Joseph
Re: DFS too busy/down? while writing back to HDFS.
From DataXceiver's javadoc /** * Thread for processing incoming/outgoing data stream. */ So it's a bit different from the handlers AFAIK. J-D On Mon, Apr 5, 2010 at 10:57 PM, steven zhuang steven.zhuang.1...@gmail.com wrote: than, J.D. my cluster has the first problem. BTW, dfs.datanode.max.xcievers means the number of concurrent connections for a datanode right? On Tue, Apr 6, 2010 at 12:35 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Look at your datanode logs around the same time. You probably either have this http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5 or that http://wiki.apache.org/hadoop/Hbase/FAQ#A6 Also you see to be putting a fair number of regions on those region servers judging by the metrics, do consider setting HBASE_HEAP higher than 1GB in conf/hbase-env.sh J-D On Mon, Apr 5, 2010 at 8:38 PM, steven zhuang steven.zhuang.1...@gmail.com wrote: greetings, while I was importing data into my HBase Cluster, I found one regionserver is down, and by check the log, I found following exceptoin: *EOFException*(during HBase flush memstore to HDFS file? not sure) seems that it's caused by DFSClient not working, I don't know the exact reason, maybe it's caused by the heavy load on the machine where the datanode is residing on, or the disk is full. but I am not sure which DFS node should I check. has anybody met the same problem? any pointer or hint is appreciated. The log is as follows: 2010-04-06 03:04:34,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 20 on 60020' on region hbt2table16,,1270522012397: memstore size 128.0m is = than blocking 128.0m size 2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 34; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047; store size is 2.9m 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 5 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438, entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to hbt2table16,,1270522012397 2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 35; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032; store size is 2.9m 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130, entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to hbt2table16,,1270522012397 2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005, entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to hbt2table16,,1270522012397 2010-04-06 03:04:35,866 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started. Attempting to free 20853136 bytes 2010-04-06 03:04:37,010 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed. Freed 20866928 bytes. Priority Sizes: Single=17.422821MB (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0) 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 36; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916; store size is 2.9m 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.*EOFException* 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
Re: hbase mapreduce scan
Or put it in MySQL, or in S3, or...or... so my point was that you need a recipient that transcends the JVMs ;) So it is doable and pretty normal to output in tables the result of MRs that map other tables, we have dozens of those here at StumbleUpon. But if it fits in a single HashMap in a single JVM, my guess is that the output is very small hence this is an operation done for live clients and not suitable for MR. J-D On Tue, Apr 6, 2010 at 4:34 AM, Michael Segel michael_se...@hotmail.com wrote: J-D, There's an alternative... He could write a M/R that takes the input from a scan() , do something, reduce() and then output the reduced set back to hbase in the form of a temp table. (Even an in memory temp table) and then at the end pull the data out in to a hash table? In theory this should be possible, but I haven't had time to play with in memory tables No? Thx -Mike Date: Mon, 5 Apr 2010 09:57:02 -0700 Subject: Re: hbase mapreduce scan From: jdcry...@apache.org To: hbase-user@hadoop.apache.org You want to put the result in a HashMap? MapReduce is a batch processing framework that runs multiple parallel JVMs over a cluster of machines so I don't see how you could simply output in a HashMap (unless you don't mind outputting on disk, then reading it back into a HashMap). So I will guess that you want to do a live query against HBase, here MR won't help you since that is meant for bulk processing which usually takes more than a minute. What you want to use is a Scan, using HTable. The unit tests have tons of example on how to use a scanner, look in the org.apache.hadoop.hbase.client package, so will find what you need. The main client package also contains some examples http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/package-summary.html J-D On Sun, Apr 4, 2010 at 11:18 AM, Jürgen Jakobitsch jakobits...@punkt.at wrote: hi, i'm totally new to hbase and mapreduce and could really need some pointer into the right direction for the following situation. i managed to run a basic mapreduce example - analog to Export.java in the hbase.mapreduce package. what i need to achieve is the following : do a map/reduce scan on a hbase table and put the results into a HashMap. could someone point me to an example. any help really appreciated wkr turnguard.com/turnguard -- punkt. netServices __ Jürgen Jakobitsch Codeography Lerchenfelder Gürtel 43 Top 5/2 A - 1160 Wien Tel.: 01 / 897 41 22 - 29 Fax: 01 / 897 41 22 - 22 netServices http://www.punkt.at _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
Re: enabling hbase metrics on a running instance
This boils down to the question: can you enable JMX while the JVM is running? The answer is no (afaik). More doc here http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html J-D On Tue, Apr 6, 2010 at 4:12 PM, Igor Ranitovic irani...@gmail.com wrote: Is it possible to enable the hbase metrics without a restart? Thanks. i.
Re: hbase mapreduce scan
You want to put the result in a HashMap? MapReduce is a batch processing framework that runs multiple parallel JVMs over a cluster of machines so I don't see how you could simply output in a HashMap (unless you don't mind outputting on disk, then reading it back into a HashMap). So I will guess that you want to do a live query against HBase, here MR won't help you since that is meant for bulk processing which usually takes more than a minute. What you want to use is a Scan, using HTable. The unit tests have tons of example on how to use a scanner, look in the org.apache.hadoop.hbase.client package, so will find what you need. The main client package also contains some examples http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/package-summary.html J-D On Sun, Apr 4, 2010 at 11:18 AM, Jürgen Jakobitsch jakobits...@punkt.at wrote: hi, i'm totally new to hbase and mapreduce and could really need some pointer into the right direction for the following situation. i managed to run a basic mapreduce example - analog to Export.java in the hbase.mapreduce package. what i need to achieve is the following : do a map/reduce scan on a hbase table and put the results into a HashMap. could someone point me to an example. any help really appreciated wkr turnguard.com/turnguard -- punkt. netServices __ Jürgen Jakobitsch Codeography Lerchenfelder Gürtel 43 Top 5/2 A - 1160 Wien Tel.: 01 / 897 41 22 - 29 Fax: 01 / 897 41 22 - 22 netServices http://www.punkt.at
Re: DFS too busy/down? while writing back to HDFS.
Look at your datanode logs around the same time. You probably either have this http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5 or that http://wiki.apache.org/hadoop/Hbase/FAQ#A6 Also you see to be putting a fair number of regions on those region servers judging by the metrics, do consider setting HBASE_HEAP higher than 1GB in conf/hbase-env.sh J-D On Mon, Apr 5, 2010 at 8:38 PM, steven zhuang steven.zhuang.1...@gmail.com wrote: greetings, while I was importing data into my HBase Cluster, I found one regionserver is down, and by check the log, I found following exceptoin: *EOFException*(during HBase flush memstore to HDFS file? not sure) seems that it's caused by DFSClient not working, I don't know the exact reason, maybe it's caused by the heavy load on the machine where the datanode is residing on, or the disk is full. but I am not sure which DFS node should I check. has anybody met the same problem? any pointer or hint is appreciated. The log is as follows: 2010-04-06 03:04:34,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 20 on 60020' on region hbt2table16,,1270522012397: memstore size 128.0m is = than blocking 128.0m size 2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 34; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047; store size is 2.9m 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 5 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438, entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to hbt2table16,,1270522012397 2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 35; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032; store size is 2.9m 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130, entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to hbt2table16,,1270522012397 2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005, entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to hbt2table16,,1270522012397 2010-04-06 03:04:35,866 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started. Attempting to free 20853136 bytes 2010-04-06 03:04:37,010 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed. Freed 20866928 bytes. Priority Sizes: Single=17.422821MB (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0) 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 36; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916; store size is 2.9m 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.*EOFException* 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2467598422201289982_1391902 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2065206049437531800_1391902 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-3059563223628992257_1391902 2010-04-06 03:05:01,588 WARN org.apache.hadoop.hdfs.DFSClient:
Re: DFS too busy/down? while writing back to HDFS.
Look at your datanode logs around the same time. You probably either have this http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5 or that http://wiki.apache.org/hadoop/Hbase/FAQ#A6 Also you see to be putting a fair number of regions on those region servers judging by the metrics, do consider setting HBASE_HEAP higher than 1GB in conf/hbase-env.sh J-D On Mon, Apr 5, 2010 at 8:38 PM, steven zhuang steven.zhuang.1...@gmail.com wrote: greetings, while I was importing data into my HBase Cluster, I found one regionserver is down, and by check the log, I found following exceptoin: *EOFException*(during HBase flush memstore to HDFS file? not sure) seems that it's caused by DFSClient not working, I don't know the exact reason, maybe it's caused by the heavy load on the machine where the datanode is residing on, or the disk is full. but I am not sure which DFS node should I check. has anybody met the same problem? any pointer or hint is appreciated. The log is as follows: 2010-04-06 03:04:34,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 20 on 60020' on region hbt2table16,,1270522012397: memstore size 128.0m is = than blocking 128.0m size 2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 34; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047; store size is 2.9m 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 5 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438, entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to hbt2table16,,1270522012397 2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 35; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032; store size is 2.9m 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130, entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to hbt2table16,,1270522012397 2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005, entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to hbt2table16,,1270522012397 2010-04-06 03:04:35,866 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started. Attempting to free 20853136 bytes 2010-04-06 03:04:37,010 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed. Freed 20866928 bytes. Priority Sizes: Single=17.422821MB (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0) 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 36; new storefile is hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916; store size is 2.9m 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.*EOFException* 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2467598422201289982_1391902 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2065206049437531800_1391902 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-3059563223628992257_1391902 2010-04-06 03:05:01,588 WARN org.apache.hadoop.hdfs.DFSClient:
Re: More about LogFlusher
LogFlusher isn't doing any FS operations, it calls HLog that does them. HLog calls sync on the SequenceFile.Writer, which is a marker, and also calls fsSync if you patched your hadoop (and replaced the hbase's hadoop jar with the patched one) with HDFS-200,826,142. J-D On Fri, Apr 2, 2010 at 2:05 AM, ChingShen chingshenc...@gmail.com wrote: Hi, Does anyone know more about the org.apache.hadoop.hbase.regionserver.LogFlusher? I don't know why does it just invoke SequenceFile.Writer.sync()? It just writes a maker into the file. Can anyone explain it to me please? Thanks. Shen
Re: Failed to create /hbase.... KeeperErrorCode = ConnectionLoss for /hbase
If the master doesn't shut down, it means it's waiting on something... you looked at the logs? You say you ran ./jps ... did you install that in the local directory? Also what do you mean it didn't work as well? What didn't work? The command didn't return anything or the HMaster process wasn't listed? Also did you check the zookeeper logs like Patrick said? You should see in there when the master tries to connect, and you should see why it wasn't able to do so. To help you I need more data about your problem. J-D On Thu, Apr 1, 2010 at 11:39 AM, jayavelu jaisenthilkumar joysent...@gmail.com wrote: Hi Daniel, I removed the property tags from the hbase-site.xml. Same error occurs. Also one strange behaviour, If i give ./stop-hbase.sh , the terminal says stopping master and never stopped. I couldnt able to ./jps to check the java in this scenario, it didnt work aswell. So I killed the Hmaster start ( ps -ef | grep java) Also manually need to kill Hregionserver both on master, slave1 and slave2. Any suggestions please... Regs, senthil On 31 March 2010 19:15, Jean-Daniel Cryans jdcry...@apache.org wrote: You set the tick time like this: property namehbase.zookeeper.property.tickTime/name value1/value descriptionProperty from ZooKeeper's config zoo.cfg. The number of milliseconds of each tick. See zookeeper.session.timeout description. /description 1 means HBase has to report to zookeeper every 1 millisecond and if for any reason it doesn't after 20ms, the session is expired (!!). I recommend using the default value. Also you should keep the same config on every node, rsync can do wonders. J-D On Wed, Mar 31, 2010 at 9:24 AM, jayavelu jaisenthilkumar joysent...@gmail.com wrote: Hi, I am using 1 master and 2 slaves one has password for ssh. I am using hadoop0.20.1 and hbase0.20.3(direct one not upgraded one) 1)The slave one with password is could not be disabled, i removed the whole .ssh directory try to ssh-keygen with passwordless phrase, still i am asked for the password when i ssh localhost 2) I am able to run hadoop and successfuly run the Mapreduce in the hadoop environment as per the Running Hadoop On Ubuntu Linux (Multi-Node Cluster) by noel 3) I am now following the tutorial hbase: overview HBase 0.20.3 API Its not clearly stated as the mulitnode cluster hadoop for the distributed mode hbase. I ran the hdfs and the hbase using start-dfs.sh and start-hbase.sh respectively. The master log indicates connection loss on the /hbase : ( is this hbase is created by Hbase or should we do to create it again 2010-03-31 16:45:57,850 INFO org.apache.zookeeper. ClientCnxn: Attempting connection to server Hadoopserver/ 192.168.1.65: 2010-03-31 16:45:57,858 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/ 192.168.1.65:43017 remote=Hadoopserver/192.168.1.65:] 2010-03-31 16:45:57,881 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2010-03-31 16:45:57,883 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@11c2b67 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) 2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown input java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) 2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown output java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) 2010-03-31 16:45:57,933 INFO org.apache.hadoop.hbase.master.RegionManager: -ROOT- region unset (but not set to be reassigned) 2010-03-31 16:45:57,934 INFO org.apache.hadoop.hbase.master.RegionManager: ROOT inserted into regionsInTransition 2010-03-31 16:45:58,024 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to read
Re: Why did HBase dead after a regionserver stopped.
(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2010-03-30 00:58:59,672 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 172.23.51.55:50010, storageID=DS-225596341-172.23.51.55-50010-1261706639224, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.23.51.55:50010 remote=/ 172.23.51.55:47568] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) In hbase log, I found org.apache.hadoop.hbase.NotServingRegionException: web_info,,1267870002080 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1896) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2010-03-31 14:16:39,076 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020, call openScanner([...@6defe475, startRow=, stopRow=, maxVersions=1, caching=10, cacheBlocks=false, timeRange=[0,9223372036854775807), families=ALL) from 172.23.52.58:42223: error: org.apache.hadoop.hbase.NotServingRegionException: web_info,,1267870002080 2010/3/31 Jean-Daniel Cryans jdcry...@apache.org Please provide us with the usuals: Hadoop/HBase version, configurations for both, hardware, OS, etc Also did you take a look at search.38d.cm3's region server log? Any obvious exceptions and if you google search them, can you find the solution? Thx J-D On Tue, Mar 30, 2010 at 7:50 PM, 无名氏 sitong1...@gmail.com wrote: I contributed a HBase cluster, and the regionserver list is search10a.cm3 search10b.cm3 search162a.cm3 search166a.cm3 search168a.cm3 search16a.cm3 search178a.cm3 search180a.cm3 search182a.cm3 search184a.cm3 search188a.cm3 search189a.cm3 search18b.cm3 search190a.cm3 search192a.cm3 search200t.cm3 search33d.cm3 search34c.cm3 search34d.cm3 search35c.cm3 search35d.cm3 search38d.cm3 search3a.cm3 search49a.cm3 search4a.cm3 search50a.cm3 search51a.cm3 search54b.cm3 search55b.cm3 search55d.cm3 search56b.cm3 search5a.cm3 search60a.cm3 search61a.cm3 search62a.cm3 build2.cme The regionserver search38d.cm3 stopped yestory. Now I run hbase shell, execute listcommand, and throwed exception. NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '', but failed after 5 attempts. Exceptions: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) org.apache.hadoop.hbase.NotServingRegionException
Re: Is NotServingRegionException really an Exception?
Arber, If your cluster doesn't recover, it means there's something else going on. Feel free to start a new thread on this mailing list to discuss that, posting relevant informations like version, hardware, configurations and logs. J-D On Wed, Mar 31, 2010 at 9:39 AM, Yabo Xu arber.resea...@gmail.com wrote: Sorry for interrupting the thread. We also gets the annoying NotServingRegionException once in a while ( especially after intensive writing), and if it happens, it seems that the only way is to stop all the programs and restart HBase. Any better way to deal with it? ( I tried flush operation on the shell, but it does not work ) Or how to avoid this from happening? Thanks, Arber On Wed, Mar 31, 2010 at 11:44 PM, Stack st...@duboce.net wrote: I always thought that the throwing of an exception to signal moved region was broke if only for the reason that it disturbing to new users. See https://issues.apache.org/jira/browse/HBASE-72 Would be nice to change it. I don't think it easy though. We'd need to rig the RPC so calls were enveloped or some such so we could pass status messages along with (or instead of) a query results. St.Ack On Wed, Mar 31, 2010 at 8:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Wed, Mar 31, 2010 at 11:02 AM, Gary Helmling ghelml...@gmail.com wrote: Well I would still view it as an exceptional condition. The client asked for data back from a server that does not own that data. Sending back an exception seems like the appropriate response, to me at least. It's just an exceptional condition that's allowed to happen in favor of the optimization of caching region locations in memory on the client. I could see the reporting of the exception being misleading though if it's being logged at an error or warn level when it's a normal part of operations. What's the logging level of the messages? On Wed, Mar 31, 2010 at 10:51 AM, Al Lias al.l...@gmx.de wrote: Am 31.03.2010 16:47, schrieb Gary Helmling: NotServingRegionException is a normal part of operations when regions transition (ie due to splits). It's how the region server signals back to the client that it needs to re-lookup the region location in .META. (which is normally cached in memory by the client, so can become stale). I'm sure it can also show up as a symptom of other problems, but if you're not seeing any other issues, then it's nothing to be concerned about. Thx Gary, this is my point: I see this many times in the (production) logs when it is actually nothing to worry about. Should'nt this rather be a normal response of a region server, instead an Exception? Al On Wed, Mar 31, 2010 at 7:38 AM, Al Lias al.l...@gmx.de wrote: As I do see this Exception really often in our logs. I wonder if this indicates a regular thing (within splits etc) or if this is something that should not normally happen. I see it often in Jira as a reason for something else that fails, but for a regular client request, where the client not perfectly up-to-date with region information it looks as something normal. Am I right here? Al The LDAP api's throw a ReferralException when you try to update a read only slave, so heir is a precedence for that. But true that an exception may be strong for something that is technically a warning.
Re: Failed to create /hbase.... KeeperErrorCode = ConnectionLoss for /hbase
You set the tick time like this: property namehbase.zookeeper.property.tickTime/name value1/value descriptionProperty from ZooKeeper's config zoo.cfg. The number of milliseconds of each tick. See zookeeper.session.timeout description. /description 1 means HBase has to report to zookeeper every 1 millisecond and if for any reason it doesn't after 20ms, the session is expired (!!). I recommend using the default value. Also you should keep the same config on every node, rsync can do wonders. J-D On Wed, Mar 31, 2010 at 9:24 AM, jayavelu jaisenthilkumar joysent...@gmail.com wrote: Hi, I am using 1 master and 2 slaves one has password for ssh. I am using hadoop0.20.1 and hbase0.20.3(direct one not upgraded one) 1)The slave one with password is could not be disabled, i removed the whole .ssh directory try to ssh-keygen with passwordless phrase, still i am asked for the password when i ssh localhost 2) I am able to run hadoop and successfuly run the Mapreduce in the hadoop environment as per the Running Hadoop On Ubuntu Linux (Multi-Node Cluster) by noel 3) I am now following the tutorial hbase: overview HBase 0.20.3 API Its not clearly stated as the mulitnode cluster hadoop for the distributed mode hbase. I ran the hdfs and the hbase using start-dfs.sh and start-hbase.sh respectively. The master log indicates connection loss on the /hbase : ( is this hbase is created by Hbase or should we do to create it again 2010-03-31 16:45:57,850 INFO org.apache.zookeeper. ClientCnxn: Attempting connection to server Hadoopserver/192.168.1.65: 2010-03-31 16:45:57,858 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/ 192.168.1.65:43017 remote=Hadoopserver/192.168.1.65:] 2010-03-31 16:45:57,881 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2010-03-31 16:45:57,883 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@11c2b67 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) 2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown input java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) 2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown output java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) 2010-03-31 16:45:57,933 INFO org.apache.hadoop.hbase.master.RegionManager: -ROOT- region unset (but not set to be reassigned) 2010-03-31 16:45:57,934 INFO org.apache.hadoop.hbase.master.RegionManager: ROOT inserted into regionsInTransition 2010-03-31 16:45:58,024 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to read: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 2010-03-31 16:45:58,422 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server Hadoopclient1/192.168.1.2: 2010-03-31 16:45:58,423 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/ 192.168.1.65:51219 remote=Hadoopclient1/192.168.1.2:] 2010-03-31 16:45:58,423 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2010-03-31 16:45:58,436 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@17b6643 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) 2010-03-31 16:45:58,437 WARN org.apache.zookeeper.ClientCnxn: Ignoring exception during shutdown input java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at
Re: Region spiting, compaction and merging
Hey Michal! Currently there's no tool you can use except cron, you can request a major compaction on a table by doing something like: echo major_compact 'some_table' | /path/to/hbase/bin/hbase shell You can merge regions using the merge tool but it must be run while HBase is down. You can run it like that: bin/hbase org.apache.hadoop.hbase.util.Merge Enabling compression on that table will allow it to stay small, use LZO (see the wiki). J-D 2010/3/31 Michał Podsiadłowski podsiadlow...@gmail.com: Hi hbase fans We started our cluster (Hbase trunk + CHD3 with hbase dedicated patches) on production environment and we left it running now for 2 days. Everything is working nice but we didn't try to brake it yet as we did previously ;) Still there are few things that concerns me. We have one table where there is only few rows - around 200 x few tens of KB which is updates quite frequently - all records few times an hour - sounds trivial but it's keep growing and splitting. Currently after 2 days there are 177 records kept in 4 regions what IMHO is not good. I had to run manually major compaction to get rid of invalidated data (from around 500MB to 0MB and few in memStore according to UI). As far as can see in the logs there were no major compactions since we started 2 days ago. Question is - it it normal that tables grows so quickly and due to being stuffed with garbage they are spited? Secondly is there a way to force hbase to perform major compaction at some particular period - i.e 5 a.m, so it doesn't generate unnecessary load during hot periods like in the evening where there is a strong demand for performance? Or maybe I am exaggerating the problem and influence on the whole system is negligible? As third is there a way to merge split regions? As far as i can see there is https://issues.apache.org/jira/browse/HBASE-420 which is minor issue. Cheers, Michal
Re: web interface is fragile?
Dave, Can you pastebin the exact error that was returned by the MR job? That looks like it's client-side (from HBase point of view). WRT the .META. and the master, the web page does do a request on every hit so if the region is unavailable then you can't see it. Looks like you kill -9'ed the region server? If so, it takes a minute to detect the region server failure and then split the write-ahead-logs so if .META. was on that machine, it will take that much time to have a working web page. Instead of kill -9, simply go on the node and run ./bin/hbase-daemon.sh stop regionserver J-D On Wed, Mar 31, 2010 at 5:51 PM, Buttler, David buttl...@llnl.gov wrote: Hi, I have a small cluster (6 nodes, 1 master and 5 region server/data nodes). Each node has lots of memory and disk (16GB of heap dedicated to RegionServers), 4 TB of disk per node for hdfs. I have a table with about 1 million rows in hbase - that's all. Currently it is split across 50 regions. I was monitoring this with the hbase web gui and I noticed that a lot of the heap was being used (14GB). I was running a MR job and I was getting an error to the console that launched the job: Error: GC overhead limit exceeded hbase First question: is this going to hose the whole system? I didn't see the error in any of the hbase logs, so I assume that it was purely a client issue. So, naively thinking that maybe the GC had moved everything to permgen and just wasn't cleaning up, I thought I would do a rolling restart of my region servers and see if that cleared everything up. The first server I killed happened to be the one that was hosting the .META. table. Subsequently the web gui failed. Looking at the errors, it seems that the web gui essentially caches the address for the meta table and blindly tries connecting on every request. I suppose I could restart the master, but this does not seem like desirable behavior. Shouldn't the cache be refreshed on error? And since there is no real code for the GUI, just a jsp page, doesn't this mean that this behavior could be seen in other applications that use HMaster? Corrections welcome Dave
Re: Data size
HBase is column-oriented; every cell is stored with the row, family, qualifier and timestamp so every pieces of data will bring a larger disk usage. Without any knowledge of your keys, I can't comment much more. Then HDFS keeps a trash so every file compacted will end up there... if you just did the import, there will be a lot of these. Finally if you imported the data more than once, hbase keeps 3 versions by default. So in short, is it reasonable? Answer: it depends! J-D 2010/3/31 y_823...@tsmc.com: Hi, We've dumped oracele data to files then put these files into different hbase table. The size of these files is 35G; we saw the HDFS usage up to 562G after putting it into hbase. Is that reasonable? Thanks Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) --- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---
Re: web interface is fragile?
The fact we see the exception 10 times means that getRegionServerWithRetries got that error 10 times before abandoning... Are you sure you don't see that on the region server's log located at 10.0.1.3? Thx, J-D On Wed, Mar 31, 2010 at 6:26 PM, Buttler, David buttl...@llnl.gov wrote: Hi J-D, Thanks for taking a look at this. The error that I received is: http://pastebin.com/ZnhVA5B0 This is the client side. I little strange as I have been running this task several times in the past, and my client heap size is set to 4GB. I can try doubling it and see if that helps Dave -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Wednesday, March 31, 2010 6:11 PM To: hbase-user@hadoop.apache.org Subject: Re: web interface is fragile? Dave, Can you pastebin the exact error that was returned by the MR job? That looks like it's client-side (from HBase point of view). WRT the .META. and the master, the web page does do a request on every hit so if the region is unavailable then you can't see it. Looks like you kill -9'ed the region server? If so, it takes a minute to detect the region server failure and then split the write-ahead-logs so if .META. was on that machine, it will take that much time to have a working web page. Instead of kill -9, simply go on the node and run ./bin/hbase-daemon.sh stop regionserver J-D On Wed, Mar 31, 2010 at 5:51 PM, Buttler, David buttl...@llnl.gov wrote: Hi, I have a small cluster (6 nodes, 1 master and 5 region server/data nodes). Each node has lots of memory and disk (16GB of heap dedicated to RegionServers), 4 TB of disk per node for hdfs. I have a table with about 1 million rows in hbase - that's all. Currently it is split across 50 regions. I was monitoring this with the hbase web gui and I noticed that a lot of the heap was being used (14GB). I was running a MR job and I was getting an error to the console that launched the job: Error: GC overhead limit exceeded hbase First question: is this going to hose the whole system? I didn't see the error in any of the hbase logs, so I assume that it was purely a client issue. So, naively thinking that maybe the GC had moved everything to permgen and just wasn't cleaning up, I thought I would do a rolling restart of my region servers and see if that cleared everything up. The first server I killed happened to be the one that was hosting the .META. table. Subsequently the web gui failed. Looking at the errors, it seems that the web gui essentially caches the address for the meta table and blindly tries connecting on every request. I suppose I could restart the master, but this does not seem like desirable behavior. Shouldn't the cache be refreshed on error? And since there is no real code for the GUI, just a jsp page, doesn't this mean that this behavior could be seen in other applications that use HMaster? Corrections welcome Dave
Re: Error Page on wiki?
Currently we have http://wiki.apache.org/hadoop/Hbase/FAQ and http://wiki.apache.org/hadoop/Hbase/Troubleshooting Feel free to improve it! J-D On Tue, Mar 30, 2010 at 4:11 PM, Buttler, David buttl...@llnl.gov wrote: Is there an error page on the wiki listing stack traces hbase users see, and associating them with potential causes? I browsed around but didn't see it. It would be nice to capture some of the knowledge that gets distributed on the mailing list, and it would really help me to understand if the errors I am seeing have a known cause or if I am seeing something new. I will be happy to contribute my errors and solutions as soon as they are available :) Thanks, Dave
Re: Why did HBase dead after a regionserver stopped.
Please provide us with the usuals: Hadoop/HBase version, configurations for both, hardware, OS, etc Also did you take a look at search.38d.cm3's region server log? Any obvious exceptions and if you google search them, can you find the solution? Thx J-D On Tue, Mar 30, 2010 at 7:50 PM, 无名氏 sitong1...@gmail.com wrote: I contributed a HBase cluster, and the regionserver list is search10a.cm3 search10b.cm3 search162a.cm3 search166a.cm3 search168a.cm3 search16a.cm3 search178a.cm3 search180a.cm3 search182a.cm3 search184a.cm3 search188a.cm3 search189a.cm3 search18b.cm3 search190a.cm3 search192a.cm3 search200t.cm3 search33d.cm3 search34c.cm3 search34d.cm3 search35c.cm3 search35d.cm3 search38d.cm3 search3a.cm3 search49a.cm3 search4a.cm3 search50a.cm3 search51a.cm3 search54b.cm3 search55b.cm3 search55d.cm3 search56b.cm3 search5a.cm3 search60a.cm3 search61a.cm3 search62a.cm3 build2.cme The regionserver search38d.cm3 stopped yestory. Now I run hbase shell, execute listcommand, and throwed exception. NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '', but failed after 5 attempts. Exceptions: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) from org/apache/hadoop/hbase/client/HConnectionManager.java:1002:in `getRegionServerWithRetries' from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' from
Re: Short DNS outage leads to No .META. found
This was fixed in https://issues.apache.org/jira/browse/HBASE-2174, will be available in 0.20.4 (or you can patch it on your 0.20.3, should apply easily). J-D On Mon, Mar 29, 2010 at 3:58 AM, Al Lias al.l...@gmx.de wrote: We have a DNS installation that has a HA-Logic, that may fail for say 10 seconds. In such a case we experience the following: * DNS goes down * The Master gets this: Received report from unknown server -- telling it to MSG_CALL_SERVER_STARTUP (Probably the IP is unknown) * The Regionservers do as directed, zookeeper logs state that /hbase/rs/ nodes are updated * DNS goes up Now there is no or a wrong master selection and no region can be served anymore. Also, no other MSG_CALL_SERVER_STARTUP appear, which could reanimate the cluster... We use host names in the regionservers file. What could we change to be more robust against such a problem? Thx, Al
Re: Zookeeper session lost
I see 2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 79410ms, ten times longer than scheduled : 5000 2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 78781ms, ten times longer than scheduled : 3000 That means a sleeping thread slept for, in the first case, 79 seconds instead of 5. This is due to a garbage collection by the JVM, aka pause of the world. Since your region server stopped answering for that long (more than the default timeout of 60 seconds), it was considered dead and when it figured it it shut down itself to stop serving the regions since the may already be served by another region server (this is why it doesn't retry to connect). This mailing list has quite a few threads about resolving that kind of problem, I suggest searching the archives (you will mainly learn about giving more than the default 1GB of heap size to HBase, to make sure you don't swap and to not CPU starve your region servers). J-D On Mon, Mar 29, 2010 at 4:27 AM, Peter Falk pe...@bugsoft.nu wrote: Hi, One of our region servers was shut down with the following messages in the log. It seems like communication with the zookeeper timed out and when it later reconnected, the session was expired and the region server then shut itself down. Seem strange to me that it should shut down, why did it not try to create a new session instead? Any ideas of how to prevent similar problems in the future? 2010-03-28 20:24:27,432 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x278bd16a96000f to sun.nio. ch.selectionkeyi...@355811ec java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 79410ms, ten times longer than scheduled : 5000 2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 78781ms, ten times longer than scheduled : 3000 2010-03-28 20:24:27,433 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x278bd16a96000d to sun.nio.ch.selectionkeyi...@2927fa12 java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2010-03-28 20:24:28,291 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server michelob/192.168.10.48:2181 2010-03-28 20:24:28,291 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/ 192.168.10.47:36626 remote=michelob/192.168.10.48:2181] 2010-03-28 20:24:28,292 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2010-03-28 20:24:28,292 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x278bd16a96000d to sun.nio. ch.selectionkeyi...@3544d65e java.io.IOException: Session Expired at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) 2010-03-28 20:24:28,293 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired TIA, Peter
Re: Region assignment in Hbase
Inline. J-D On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com wrote: Hi all, I read the issue HBase-57 ( https://issues.apache.org/jira/browse/HBASE-57 ) . I don't really understand the use of assigning regions keeping DFS in mind. Can anyone give an example usecase showing its advantages A region is composed of files, files are composed of blocks. To read data, you need to fetch those blocks. In HDFS you normally have access to 3 replicas and you fetch one of them over the network. If one of the replica is on the local datanode, you don't need to go through the network. This means less network traffic and better response time. Can map-reduce exploit it's advantage in any way (if data is distributed in the above manner) or is it just the read-write performance that gets improved . MapReduce works in the exact same way, it always tries to put the computation next to where the data is. I recommend reading the MapReduce tutorial http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Overview Can some one please help me in understanding this. Regards JS
Re: RegionServer Aborting
Your region server log is missing the reason for the abort, but if you had the following error in the DN log then it probably means that the RS aborted because it wasn't able to write into HDFS. Since HBase doesn't have any insight into why it's not able to contact a DN, it prefers the paranoid way and shuts itself down. If you search the mailing lists for that error, you will probably stumble upon the following configuration: property namedfs.datanode.socket.write.timeout/name value0/value /property This is set in hdfs-site.xml, it's a config I personally use and I never saw that problem on my clusters since. Hope this helps, J-D 2010/3/26 y_823...@tsmc.com: HDFS log java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.81.47.50:50010 remote=/10.81.47.35:34325] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2010-03-26 15:53:30,910 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.81.47.50:50010, storageID=DS-758373957-10.81.47.50-50010-1264018078483, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.81.47.50:50010 remote=/10.81.47.35:34325] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) Fleming Chiu(邱宏明) 707-6128 y_823...@tsmc.com 週一無肉日吃素救地球(Meat Free Monday Taiwan) y_823...@tsmc.com To: hbase-user@hadoop.apache.org 2010/03/26 05:06 cc: (bcc: Y_823910/TSMC) PM Subject: RegionServer Aborting Please respond to hbase-user Hi, I didn't send any command to shutdown my region server, so I don't know why my region server shutdown automatically? Any ideas? HBase version : 0.20.2, r834515 Hadoop version: 0.20.1, r810220 2010-03-26 15:56:59,330 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at: 10.81.47.50:60020 2010-03-26 15:57:01,797 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting 2010-03-26 15:57:01,797 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x1279807e42c0003 2010-03-26 15:57:01,797 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session: 0x1279807e42c0003 2010-03-26 15:57:01,800 INFO org.apache.zookeeper.ClientCnxn: Exception while closing send thread for session 0x1279807e42c0003 : Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] 2010-03-26 15:57:01,915 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for session: 0x1279807e42c0003 2010-03-26 15:57:01,915 INFO org.apache.zookeeper.ZooKeeper: Session: 0x1279807e42c0003 closed 2010-03-26 15:57:01,915 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with ZooKeeper 2010-03-26 15:57:01,915 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2010-03-26 15:57:02,024 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/10.81.47.50:60020 exiting 2010-03-26 15:57:06,669 INFO org.apache.hadoop.hbase.Leases: regionserver/10.81.47.50:60020.leaseChecker closing leases 2010-03-26 15:57:06,669 INFO org.apache.hadoop.hbase.Leases: regionserver/10.81.47.50:60020.leaseChecker closed leases 2010-03-26 15:57:06,670 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread. 2010-03-26 15:57:06,670 INFO
Re: Cannot open filename Exceptions
4 CPUs seems ok, unless you are running 2-3 MR tasks at the same time. So your value for the timeout is 24, but did you change the tick time? The GC pause you got seemed to last almost a minute which, if you did not change the tick value, matches 3000*20 (disregard your session timeout). J-D On Thu, Mar 25, 2010 at 1:07 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello J-D, Thank you for your reply first. How many CPUs do you have? Every server has 2 Dual-Core cpus. Are you swapping? Now I'm not sure about it with our monitor tools, but now we have written a script to record vmstat log every 2 seconds. If something wrong happen again, we can take it. Also if the only you are using this system currently to batch load data or as an analytics backend, you probably want to set the timeout higher: But our value of this property is already 24. We will try to optimize our garbage collector and we will see what will happen. Thanks again, J-D, LvZheng 2010/3/25 Jean-Daniel Cryans jdcry...@apache.org 2010-03-24 11:33:52,331 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 54963ms, ten times longer than scheduled: 3000 You had an important garbage collector pause (aka pause of the world in java-speak) and your region server's session with zookeeper expired (it literally stopped responding for too long, so long it was considered dead). Are you swapping? How many CPUs do you have? If you are slowing down the garbage collecting process, it will take more time. Also if the only you are using this system currently to batch load data or as an analytics backend, you probably want to set the timeout higher: property namezookeeper.session.timeout/name value6/value descriptionZooKeeper session timeout. HBase passes this to the zk quorum as suggested maximum time for a session. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions The client sends a requested timeout, the server responds with the timeout that it can give the client. The current implementation requires that the timeout be a minimum of 2 times the tickTime (as set in the server configuration) and a maximum of 20 times the tickTime. Set the zk ticktime with hbase.zookeeper.property.tickTime. In milliseconds. /description /property This value can only be 20 times bigger than this: property namehbase.zookeeper.property.tickTime/name value3000/value descriptionProperty from ZooKeeper's config zoo.cfg. The number of milliseconds of each tick. See zookeeper.session.timeout description. /description /property So you could set tick to 6000, timeout to 12 for a 2min timeout.
Re: Which Hadoop and Hbase for stability
I meant hadoop, in hbase svn structure is obvious. Doh! Hadoop 0.20 is pre-split so the whole thing is there http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20/ They are. We've also backported other patches that are in 0.21. CDH3 is easier than having to deal with applying your own patches and building Hadoop yourself. So you advice clean cdh3 release or should i apply also patches from http://archive.cloudera.com/cdh-3-dev-builds/hbase/ and build it myself? As far as i can see changes done by cloudera don't include patches from cdh-3-dev-builds/hbase? I'll let them answer. Thanks, MP
Re: ported lucandra: lucene index on HBase
That sounds great Thomas! You can start by adding an entry here http://wiki.apache.org/hadoop/SupportingProjects WRT becoming an HBase contrib, we have a rule that at least one committer (or a very active contributor) must be in charge and be available to fix anything broken in it due to changes in core HBase. For example, if a contrib doesn't compile before a release, we will exclude it. J-D On Thu, Mar 25, 2010 at 2:42 AM, Thomas Koch tho...@koch.ro wrote: Hi, Lucandra stores a lucene index on cassandra: http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend As the author of lucandra writes: I’m sure something similar could be built on hbase. So here it is: http://github.com/thkoch2001/lucehbase This is only a first prototype which has not been tested on anything real yet. But if you're interested, please join me to get it production ready! I propose to keep this thread on hbase-user and java-dev only. Would it make sense to aim this project to become an hbase contrib? Or a lucene contrib? Best regards, Thomas Koch, http://www.koch.ro
Re: Bulk import, HFiles, Multiple reducers and TotalOrderPartitioner
Ruslan, I see you did all the required homework but this mail is really hard to read. Can you create a jira (http://issues.apache.org/jira/browse/HBASE) and attach all the code? This will also make it easier to track. thx! J-D On Wed, Mar 24, 2010 at 6:02 PM, Ruslan Salyakhov rusla...@gmail.com wrote: Hi! I'm trying to use bulk import that writing HFiles directly into HDFS and have a problem with multiple reducers. If I run MR to prepare HFIles with more than one reducer then some values for keys are not appeared in the table after loadtable.rb script execution. With one reducer everything works fine. Let's take a look at details: Environment: - Hadoop 0.20.1 - HBase release 0.20.3 http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk - the row id must be formatted as a ImmutableBytesWritable - MR job should ensure a total ordering among all keys http://issues.apache.org/jira/browse/MAPREDUCE-366 (patch-5668-3.txt) - TotalOrderPartitioner that uses the new API https://issues.apache.org/jira/browse/HBASE-2063 - patched HFileOutputFormat Sample data of my keys: 1.3.SWE.AB.-1.UPPLANDS-VASBY.1.1.0.1 1.306.CAN.ON.-1.LONDON.1.1.0.1 1.306.USA.CO.751.FT COLLINS.1.1.1.0 1.306.USA.CO.751.LITTLETON.1.1.1.0 4.6.USA.TX.623.MUENSTER.1.1.0.0 4.67.USA.MI.563.GRAND RAPIDS.1.1.0.0 4.68.USA.CT.533.WILLINGTON.1.1.1.0 4.68.USA.LA.642.LAFAYETTE.1.1.1.0 4.9.USA.CT.501.STAMFORD.1.1.0.0 4.9.USA.NJ.504.PRINCETON.1.1.0.1 4.92.USA.IN.527.INDIANAPOLIS.1.1.0.0 I've put everything together: 1) Test of TotalOrderPartitioner that checks how it works with my keys. note that I've set up my comparator to pass that test conf.setClass(mapred.output.key.comparator.class, MyKeyComparator.class, Object.class); import java.io.IOException; import java.util.ArrayList; import junit.framework.TestCase; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner; public class TestTotalOrderPartitionerForHFileKeys extends TestCase { private static final ImmutableBytesWritable[] splitKeys = new ImmutableBytesWritable[] { // -inf // 0 new ImmutableBytesWritable(Bytes.toBytes(0.27.USA.OK.650.FAIRVIEW.1.1.0.1)), // 1 new ImmutableBytesWritable(Bytes.toBytes(0.430.USA.TX.625.Rollup.1.1.0.0)), // 2 new ImmutableBytesWritable(Bytes.toBytes(0.9.USA.NY.501.NEW YORK.1.1.0.0)), // 3 new ImmutableBytesWritable(Bytes.toBytes(1.103.USA.DC.511.Rollup.1.1.0.0)), // 4 new ImmutableBytesWritable(Bytes.toBytes(1.11.CAN.QC.-1.MONTREAL.1.1.1.0)), // 5 new ImmutableBytesWritable(Bytes.toBytes(1.220.USA.NC.Rollup.Rollup.1.1.1.0)), // 6 new ImmutableBytesWritable(Bytes.toBytes(1.225.USA.Rollup.Rollup.Rollup.1.1.0.1)),// 7 new ImmutableBytesWritable(Bytes.toBytes(1.245.ZAF.WC.-1.PAROW.1.1.0.1)), // 8 new ImmutableBytesWritable(Bytes.toBytes(1.249.USA.MI.513.BAY CITY.1.1.0.0)) // 9 }; private static final ArrayListCheckImmutableBytesWritable testKeys = new ArrayListCheckImmutableBytesWritable(); static { testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(0.10.USA.CA.825.SAN DIEGO.1.1.0.1)), 0)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(0.103.FRA.J.-1.PARIS.1.1.0.1)), 0)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(0.3.GBR.SCT.826032.PERTH.1.1.0.1)), 1)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(0.42.GBR.ENG.Rollup.Rollup.1.1.0.1)), 1)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(0.7.USA.CA.807.SANTA CLARA.1.1.0.0)), 2)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(1.10.SWE.AB.-1.STOCKHOLM.1.1.0.0)), 3)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(1.108.ABW.Rollup.Rollup.Rollup.1.1.0.0)), 4)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(1.11.CAN.NB.-1.SACKVILLE.1.1.0.1)), 4)); testKeys.add(new CheckImmutableBytesWritable(new ImmutableBytesWritable(Bytes .toBytes(1.11.CAN.Rollup.Rollup.Rollup.1.1.0.0)), 5));
Re: Batch query?
Not yet: http://issues.apache.org/jira/browse/HBASE-1845 J-D On Thu, Mar 25, 2010 at 1:29 PM, Geoff Hendrey ghend...@decarta.com wrote: Is there a way to submit multiple Get queries in a batch? -geoff
Re: Which Hadoop and Hbase for stability
What about doing the same with hadoop - using trunk? The reasons were presented at the last HUG, see Jonathan's and Todd's presentations http://wiki.apache.org/hadoop/HBase/HBasePresentations As far as i can see there is no branch for 0.20 or is it simply trunk? http://svn.apache.org/repos/asf/hadoop/hbase/branches/0.20/ Why not all patches from Cloudera pack are available in jira? They usually are or will be AFAIK. I'll let them answer. As far as i can see also not all available were applied to repository version. Are they not reviewed/accepted? HDFS-200 for example will never be applied, in hadoop 0.21 and the next releases we will be depending on HDFS-265. The presentations I linked give more details. The rest are work in progress or in the same situation as HDFS-200. Which version of hadoop jar I should include into hbase? The one that you will be using? Is this one shipped with trunk appropriated for patched hadoop? So is the one in branch IIRC. Lots of questions ;) Thanks for help Michal 2010/3/24 Jean-Daniel Cryans jdcry...@apache.org: 0.20.4 will contain the necessary improvements required to use HDFS-200 in an efficient way, so instead of starting on 0.20.3 you should instead checkout the head of the 0.20 branch. Currently there's no released hadoop version for HDFS-200, but Cloudera made it public that CDH3 (or some version of it) will contain the necessary hadoop patches for HBase. You can see http://archive.cloudera.com/cdh-3-dev-builds/hbase/ for their current list of hadoop patches to support HBase. I think we should soon release a beta of HBase 0.20.4 since it already contains tons of improvements and offers data durability when used with HDFS-200. J-D 2010/3/24 Michał Podsiadłowski podsiadlow...@gmail.com: Hi Hbase fans, I'm trying to prepare as stable version as it is possible of our Hbase/Hadoop stack. This involves adding some patches. As a base i want to use hbase 0.20.3 and hadoop 0.20.2. Which patches I should apply for hdfs and Hbase? AFAIK for hadoop https://issues.apache.org/jira/browse/HDFS-200 https://issues.apache.org/jira/browse/HDFS-127 https://issues.apache.org/jira/browse/HDFS-826 For hbase https://issues.apache.org/jira/browse/HBASE-2244 which we already had pleasure to experience ;) Is there anything else available which is improves handling disaster situations ;)? Are there any patches which are only client/server specific? Which patches are bundled to hadoop that is shipped with hbase? (HDFS-826, HDFS-200 anything else?) Is there any hadoop version already patched available and ready to run as base for hbase? Thanks for help Michal
Re: Cannot open filename Exceptions
2010-03-24 11:33:52,331 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 54963ms, ten times longer than scheduled: 3000 You had an important garbage collector pause (aka pause of the world in java-speak) and your region server's session with zookeeper expired (it literally stopped responding for too long, so long it was considered dead). Are you swapping? How many CPUs do you have? If you are slowing down the garbage collecting process, it will take more time. Also if the only you are using this system currently to batch load data or as an analytics backend, you probably want to set the timeout higher: property namezookeeper.session.timeout/name value6/value descriptionZooKeeper session timeout. HBase passes this to the zk quorum as suggested maximum time for a session. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions The client sends a requested timeout, the server responds with the timeout that it can give the client. The current implementation requires that the timeout be a minimum of 2 times the tickTime (as set in the server configuration) and a maximum of 20 times the tickTime. Set the zk ticktime with hbase.zookeeper.property.tickTime. In milliseconds. /description /property This value can only be 20 times bigger than this: property namehbase.zookeeper.property.tickTime/name value3000/value descriptionProperty from ZooKeeper's config zoo.cfg. The number of milliseconds of each tick. See zookeeper.session.timeout description. /description /property So you could set tick to 6000, timeout to 12 for a 2min timeout. J-D On Wed, Mar 24, 2010 at 8:01 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, Yesterday we got another problem about zookeeper session expired, leading rs shutdown, which never happened before. I googled it, finding some docs about it, but didnot get things really certain about how it happened and how to avoid it. Now I have put the corresponding logs to http://rapidshare.com/files/367820690/208-0324.log.html. Look forward to your reply. Thank you. LvZheng 2010/3/24 Zheng Lv lvzheng19800...@gmail.com Hello Stack, Thank you for your explainations, it's very helpful, Thank you. If I get something new, I'll connect you. Regards, LvZheng 2010/3/24 Stack st...@duboce.net On Tue, Mar 23, 2010 at 8:42 PM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Stack, So, for sure ugly stuff is going on. I filed https://issues.apache.org/jira/browse/HBASE-2365. It looks like we're doubly assigning a region. Can you tell me how this happened in detail? Thanks a lot. Yes. Splits are run by the regionserver. It figures a region needs to be split and goes ahead closing the parent and creating the daughter regions. It then adds edits to the meta table offlining the parent and inserting the two new daughter regions. Next it sends a message to the master telling it that a region has been split. The message contains names of the daughter regions. On receipt of the message, the master adds the new daughter regions to the unassigned regions list so they'll be passed out the next time a regionserver checks in. Concurrently, the master is running a scan of the meta table every minute making sure all is in order. One thing it does is if it finds unassigned regions, it'll add them to the unassigned regions (this process is what gets all regions assigned after a startup). In your case, whats happening is that there is a long period between the add of the new split regions to the meta table and the report of split to the master. During this time, the master meta scan ran, found one of the daughters and went and assigned it. Then the split message came in and the daughter was assigned again! There was supposed to be protection against this happening IIRC. Looking at responsible code, we are trying to defend against this happening in ServerManager: /* * Assign new daughter-of-a-split UNLESS its already been assigned. * It could have been assigned already in rare case where there was a large * gap between insertion of the daughter region into .META. by the * splitting regionserver and receipt of the split message in master (See * HBASE-1784). * @param hri Region to assign. */ private void assignSplitDaughter(final HRegionInfo hri) { MetaRegion mr = this.master.regionManager.getFirstMetaRegionForRegion(hri); Get g = new Get(hri.getRegionName()); g.addFamily(HConstants.CATALOG_FAMILY); try { HRegionInterface server = master.connection.getHRegionConnection(mr.getServer()); Result r = server.get(mr.getRegionName(), g); // If size 3 -- presume regioninfo, startcode and server -- then presume // that this daughter already assigned and return. if (r.size() = 3) return; } catch
Re: Why do we need the historian column family in .META. table?
That was a family used to keep track of region operations like open, close, compact, etc. It proved to be more troublesome than handy so we disabled this feature until coming up with a better solution. The family stayed for backward compatibility. J-D On Tue, Mar 23, 2010 at 6:50 PM, ChingShen chingshenc...@gmail.com wrote: Hi, I saw a historian column family in .META. table, but in what situation do we need the column family? thanks. Shen
Re: Adding filter to scan at remote client causes UnknownScannerException
Alex, Good job on finding out your issue, which boils down to our mistake as hbase devs. 0.20.3 included fixes for the filters and changed their readFields/write behavior. This should have either 1) not be committed or 2) we should have bumped the RPC version. I ran cross-version tests before releasing 0.20.3 but I did not verify filters. This could probably be automated with our EC2 tests we are planning (wink Andrew Purtell), eg running all our tests with different versions of clients and servers. J-D On Mon, Mar 22, 2010 at 5:17 AM, Alex Baranov alex.barano...@gmail.com wrote: Fond the mistake. It was mine, sorry. The error was caused by using HBase 0.20.2 version jar on client end (and 0.20.3 on the server end). Despite I put proper version in classpath in java -jar command (I copied here the java run command previosly), in my client app jar the manifest file had a link to hbase 0.20.2 version jar. (Btw, this happend because I'm using maven in development and once back I added dependency to the HBase 0.20.2 since it was (and *is now*) the only one available in public maven repos). Thank you for support: your check make me clean-up and re-double-check everything. Alex. On Mon, Mar 22, 2010 at 11:31 AM, Alex Baranov alex.barano...@gmail.comwrote: It hangs for some time. I'm not using any contribs. Thanks for the help, Alex. On Fri, Mar 19, 2010 at 11:29 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Alex, I tried your code from a remote machine to a pseudo-distributed setup and it worked well (on trunk, didn't have a 0.20.3 setup around). When the call fails, doesn't it return right away or it hangs for some time? Also are you using any contrib? Thx J-D On Fri, Mar 19, 2010 at 5:28 AM, Alex Baranov alex.barano...@gmail.com wrote: Hello J-D, Thanks for helping me out! Here is the code that works if I run it on machine that has HBase master on it and doesn't work on remote client box: // CODE BEGINS HBaseConfiguration conf = new HBaseConfiguration(); HTable hTable = new HTable(conf, agg9); Scan scan = new Scan(); scan.setStartRow(Bytes.toBytes(qf|byday_bytype_|14656__|)); FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL); SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes(agg), Bytes.toBytes(count), CompareFilter.CompareOp.GREATER, Bytes.toBytes(35)); filters.addFilter(filter); InclusiveStopFilter stopFilter = new InclusiveStopFilter(Bytes.toBytes(qf|byday_|14739_|)); filters.addFilter(stopFilter); scan.setFilter(filters); ResultScanner rs = hTable.getScanner(scan); Result next = rs.next(); int readCount = 0; while (next != null readCount 40) { System.out.println(Row key: + Bytes.toString(next.getRow())); System.out.println(count: + Bytes.toInt(next.getValue(Bytes.toBytes(agg), Bytes.toBytes(count; next = rs.next(); readCount++; } // CODE ENDS If I comment the line filters.addFilter(filter); then the code works on remote client box as well. Client fails with the exception I provided previously. The bigger master log (this is the log till the end and started when I ran the client code): 2010-03-19 12:15:02,098 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 27.0 2010-03-19 12:15:41,369 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.210.71.80:39207, regionname: .META.,,1, startKey: } 2010-03-19 12:15:41,398 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 25 row(s) of meta region {server: 10.210.71.80:39207, regionname: .META.,,1, startKey: } complete 2010-03-19 12:15:41,398 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned 2010-03-19 12:15:41,548 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.210.71.80:39207, regionname: -ROOT-,,0, startKey: } 2010-03-19 12:15:41,549 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.210.71.80:39207, regionname: -ROOT-,,0, startKey: } complete 2010-03-19 12:15:45,398 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=43.474342MB (45586152), Free=156.21317MB (163801368), Max=199.6875MB (209387520), Counts: Blocks=684, Access=90114, Hit=80498, Miss=9616, Evictions=0, Evicted=0, Ratios: Hit Ratio=89.32906985282898%, Miss Ratio=10.670927911996841%, Evicted/Run=NaN 2010-03-19 12:16:02,108 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 27.0 2010-03-19 12:16:41,378 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.210.71.80:39207
Re: Adding filter to scan at remote client causes UnknownScannerException
0 2010-02-04 14:37 hbase-ubuntu-zookeeper-domU-12-31-39-09-40-A2.out.3 -rw-r--r-- 1 ubuntu ubuntu 0 2010-02-04 08:15 hbase-ubuntu-zookeeper-domU-12-31-39-09-40-A2.out.4 -rw-r--r-- 1 ubuntu ubuntu 0 2010-02-04 07:56 hbase-ubuntu-zookeeper-domU-12-31-39-09-40-A2.out.5 Is there anything else I can provide you? The command I'm using to run the client is the following (you can see that 0.20.3 version of HBase is used and also other versions you might be interested in): java -cp commons-cli-1.2.jar:commons-logging-1.1.1.jar:hadoop-0.20.1-core.jar:hbase-0.20.3.jar:hbase-0.20.3-test.jar:log4j-1.2.15.jar:test-1.0-SNAPSHOT.jar:zookeeper-3.2.2.jar com.foo.bar.client.ClientExample Thank you for your help! Alex On Thu, Mar 18, 2010 at 7:13 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Alex, Is there anything else in the region server logs before that like lease expirations? Can we see a much bigger log? Also is there anything in the .out file? Can you post a snippet of the code you are using? Thx J-D On Wed, Mar 17, 2010 at 11:25 PM, Alex Baranov alex.barano...@gmail.com wrote: To give more clarity, I'm using *not custom* filter, but standard SingleColumnValueFilter. So it's not related to classpath issues. Any help is very appreciated! Thanks, Alex. On Wed, Mar 17, 2010 at 6:00 PM, Alex Baranov alex.barano...@gmail.com wrote: Hello guys, I've got a problem while adding a filter to scanner in a client app which runs on the remote (not the one from HBase cluster) box. The same code works well and scan result is fetched very quickly if I run the client on the same box where HBase master resides. If I comment out adding filter then the scanner returns results. But with filter it keeps showing me the error below. I'm using HBase 0.20.3 on both ends. On the mailing list I saw that problems like this can arise when using different versions of HBase on server and on client, but this is not the case. Also the error like this can show up when it takes a lot of time to initialize scanner (lease time by default is 1 min), but I assume this is also not the case since without adding filter I got results very quickly. Does anyone have an idea what is going on? - in log on remote client side: Exception in thread main org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.210.71.80:39207 for region x,,1267450079067, row 'y', but failed after 10 attempts. Exceptions: java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1931) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1851) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:372) at com.sematext.sa.client.AggregatesAccessor.getResult(AggregatesAccessor.java:74) at com.sematext.sa.client.ClientExample.main(ClientExample.java:41) - in HBase master log: 2010-03-17 12:37:45,068 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.UnknownScannerException: Name: -1 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1877) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) - in HBase region server log/out: nothing Thank you in advance. Alex.
Re: Adding filter to scan at remote client causes UnknownScannerException
Alex, Is there anything else in the region server logs before that like lease expirations? Can we see a much bigger log? Also is there anything in the .out file? Can you post a snippet of the code you are using? Thx J-D On Wed, Mar 17, 2010 at 11:25 PM, Alex Baranov alex.barano...@gmail.com wrote: To give more clarity, I'm using *not custom* filter, but standard SingleColumnValueFilter. So it's not related to classpath issues. Any help is very appreciated! Thanks, Alex. On Wed, Mar 17, 2010 at 6:00 PM, Alex Baranov alex.barano...@gmail.comwrote: Hello guys, I've got a problem while adding a filter to scanner in a client app which runs on the remote (not the one from HBase cluster) box. The same code works well and scan result is fetched very quickly if I run the client on the same box where HBase master resides. If I comment out adding filter then the scanner returns results. But with filter it keeps showing me the error below. I'm using HBase 0.20.3 on both ends. On the mailing list I saw that problems like this can arise when using different versions of HBase on server and on client, but this is not the case. Also the error like this can show up when it takes a lot of time to initialize scanner (lease time by default is 1 min), but I assume this is also not the case since without adding filter I got results very quickly. Does anyone have an idea what is going on? - in log on remote client side: Exception in thread main org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.210.71.80:39207 for region x,,1267450079067, row 'y', but failed after 10 attempts. Exceptions: java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException java.io.IOException: Call to /10.210.71.80:39207 failed on local exception: java.io.EOFException at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1931) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1851) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:372) at com.sematext.sa.client.AggregatesAccessor.getResult(AggregatesAccessor.java:74) at com.sematext.sa.client.ClientExample.main(ClientExample.java:41) - in HBase master log: 2010-03-17 12:37:45,068 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.UnknownScannerException: Name: -1 at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1877) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) - in HBase region server log/out: nothing Thank you in advance. Alex.
Re: New Datanode won't start, null pointer exception
Google is your friend ;) https://issues.apache.org/jira/browse/HADOOP-5687 J-D On Thu, Mar 18, 2010 at 1:29 PM, Scott skes...@weather.com wrote: We have a working 10 node cluster and are trying to add an 11th box (insert Spinal Tap joke here). The box (Centos Linux) was built in an identical manner to the other 10 and has the same version of hadoop (0.20.2). The configs are the exact same as the other nodes. However when trying to start the hadoop daemons it throws a NPE. Here is all that is written to the logs. Any idea whats causing this? / 2010-03-18 16:09:42,993 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = hadoop0b10/192.168.60.100 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 / 2010-03-18 16:09:43,058 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.NullPointerException at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:246) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) 2010-03-18 16:09:43,059 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at hadoop0b10/192.168.60.100 /
Re: Weird HBase Shell issue with count
Zookeeper doesn't need _that_ much ;) You say you are loosing your zk server... can we see the error? Pastebin? Thx J-D On Tue, Mar 16, 2010 at 11:48 PM, Michael Segel michael_se...@hotmail.com wrote: Unfortunately I can't up the ulimit easily. :-( I'll have to get an admin to do that. I did update the xceivers and set it to 2048 based on something I saw. But I'm losing my zookeeper on the node. Getting an IO error. I had the handler count high at 50 but reset it back down to 25 (default value) From what I've read, I definitely will move the zookeeper nodes when I can find additional machines to add to the cluster. Again any input welcome. Thx -Mike Date: Tue, 16 Mar 2010 20:30:27 -0800 Subject: Re: Weird HBase Shell issue with count From: st...@duboce.net To: hbase-user@hadoop.apache.org Oh, you've read the 'getting started' and the hbase requirements where it specifies upping ulimit and xceivers in your cluster? St.Ack On Tue, Mar 16, 2010 at 8:29 PM, Stack st...@duboce.net wrote: Is DEBUG enabled in the log4j.properties that the client can see? If not, enable it. If so, can you see the regions loading as the count progresses? Which region does it stop at? Can you try to do a get on its startkey? Does it work? St.Ack On Tue, Mar 16, 2010 at 8:25 PM, Michael Segel michael_se...@hotmail.com wrote: Ok, Still trying to track down some issues. I opened up an hbase shell and decided to use count to count the number of rows in a table. As it was running, count was flying along until it hit 150,000 then stopped. Just stood there, nothing. I started to check the other nodes in the cloud to see what is happening and the load on the data nodes, which are also region servers jumped up, where one 1 node jumped up to 2.71 ... other nodes saw some jump but again it doesn't make sense why the count suddenly died. I'm going to check the logs, but has anyone seen something like this? Thx -Mike _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_1 _ The New Busy is not the old busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_3
Re: Analysing slow HBase mapreduce performance
Did you set scanner caching higher? J-D On Tue, Mar 16, 2010 at 9:10 PM, Dmitry dmi...@tellapart.com wrote: Hi all, I'm trying to analyse some issues with HBase performance in a mapreduce. I'm running a mapreduce which reads a table and just writes it out to HDFS. The table is small, roughly ~400M of data and 18M rows. I've pre-split the table into 32 regions, so that I'm not running into the problem of having one region server serve the entire table. I'm running an HBase cluster with: - 16 region servers (each on the same machine as a Hadoop tasktracker and datanode). - 1 master (on the same machine as the Hadoop jobtracker and namenode.) - Zookeeper quorum of just 1 machine (on the same machine as the master). I have LZO compression enabled for both HBase and Hadoop. Running this job takes about 5-6 minutes. Running a mapreduce reading the exact same set of data from a SequenceFile on HDFS takes only about 1 minute. What else can I do to try to diagnose this? Thanks, - Dmitry
Re: Analysing slow HBase mapreduce performance
Out of interest... to what did you set it and what was the speed-up like? J-D On Tue, Mar 16, 2010 at 9:26 PM, Dmitry Chechik dmi...@tellapart.com wrote: That did it. Thanks! On Tue, Mar 16, 2010 at 9:14 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Did you set scanner caching higher? J-D On Tue, Mar 16, 2010 at 9:10 PM, Dmitry dmi...@tellapart.com wrote: Hi all, I'm trying to analyse some issues with HBase performance in a mapreduce. I'm running a mapreduce which reads a table and just writes it out to HDFS. The table is small, roughly ~400M of data and 18M rows. I've pre-split the table into 32 regions, so that I'm not running into the problem of having one region server serve the entire table. I'm running an HBase cluster with: - 16 region servers (each on the same machine as a Hadoop tasktracker and datanode). - 1 master (on the same machine as the Hadoop jobtracker and namenode.) - Zookeeper quorum of just 1 machine (on the same machine as the master). I have LZO compression enabled for both HBase and Hadoop. Running this job takes about 5-6 minutes. Running a mapreduce reading the exact same set of data from a SequenceFile on HDFS takes only about 1 minute. What else can I do to try to diagnose this? Thanks, - Dmitry
Re: NoSuchColumnFamilyException
Ted, You aren't the first one to report that issue (I saw 2 other ppl reporting it since 2 weeks ago), looks like a real bug. Can you grep around your hbase logs for ruletable,,1268431015006 and see if there's any exception related to that region? Can you identify exactly when it happened and what was happening? Thx J-D On Fri, Mar 12, 2010 at 2:33 PM, Ted Yu yuzhih...@gmail.com wrote: Hi, When I tried to insert into ruletable, I saw: hbase(main):003:0 put 'ruletable', 'com.yahoo.www', 'lpm_1.0:category', '1123:1' NativeException: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: org.apache.hadoop.hba se.regionserver.NoSuchColumnFamilyException: Column family lpm_1.0 does not exist in region ruletable,,1 268431015006 in table {NAME = 'ruletable', FAMILIES = []} at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2375) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1241) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) However: hbase(main):002:0 describe 'ruletable' DESCRIPTION {NAME = 'ruletable', FAMILIES = [{NAME = 'exactmatch_1.0', VERSIONS = '3', COMPRESSION = 'LZO', TTL = '1209600', TTU = '1123300', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'lpm_1.0', COMPRESSION = 'LZO', VERSIONS = '3', TTL = '15552000', TTU = '14688000', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} Can someone explain the above scenario ? Thanks
Re: Table left unresponsive after Thrift socket timeout
Joe, We'll need to learn what happened to that region, they usually don't throw up after a few inserts ;) So in that region server's log, before you tried disabling that table, do you see anything wrong (exceptions probably)? If you have a web server, it would be nice to drop the full RS log and the master log. thx! J-D On Wed, Mar 10, 2010 at 5:54 PM, Joe Pepersack j...@pepersack.net wrote: On 03/10/2010 07:58 PM, Jean-Daniel Cryans wrote: Which HBase version? What's your hardware like? How much data were you inserting? Did you grep the region server logs for any IOException or such? Can we see an excerpt of those logs around the time of the lock up? Version: 0.20.3-1.cloudera Hardware: dual Xeon 4 core, 16G, 1.7T disk 10x nodes: 1 master, 1 secondary master, 8x regionservers. 2x zookeepers running on regionservers It appears to have died after only a few rows were inserted. There's only one region shown on the status page. Curiously, that region does NOT show up in the list of online regions for the listed regionserver. Master log, from the point where I ran drop 'Person' in the shell: 010-03-10 20:44:44,812 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.40.0.37:60020, regionname: -ROOT-,,0, startKey:} 2010-03-10 20:44:44,815 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.40.0.37:60020, regionname: -ROOT-,,0, startKey:} complete 2010-03-10 20:44:44,836 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.40.0.36:60020, regionname: .META.,,1, startKey:} 2010-03-10 20:44:44,844 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 3 row(s) of meta region {server: 10.40.0.36:60020, regionname: .META.,,1, startKey:} complete 2010-03-10 20:44:44,844 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned 2010-03-10 20:44:45,357 INFO org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, average load 1.2 2010-03-10 20:45:03,209 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-03-10 20:45:03,209 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-03-10 20:45:03,210 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region Person,,1268251509658 to setClosing list 2010-03-10 20:45:04,260 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-03-10 20:45:04,260 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-03-10 20:45:04,260 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region Person,,1268251509658 to setClosing list 2010-03-10 20:45:05,273 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-03-10 20:45:05,273 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-03-10 20:45:05,273 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region Person,,1268251509658 to setClosing list 2010-03-10 20:45:06,287 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-03-10 20:45:06,287 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-03-10 20:45:06,287 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region Person,,1268251509658 to setClosing list 2010-03-10 20:45:08,301 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions 2010-03-10 20:45:08,301 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Processing regions currently being served 2010-03-10 20:45:08,301 DEBUG org.apache.hadoop.hbase.master.ChangeTableState: Adding region Person,,1268251509658 to setClosing list Log from the region server where the region is supposed to be for the same time frame: 2010-03-10 20:43:50,889 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1.6213074MB (1700064), Free=195.8787MB (205393696), Max=197.5MB (207093760), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0, Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted/Run=NaN 2010-03-10 20:44:50,889 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1.6213074MB (1700064), Free=195.8787MB (205393696), Max=197.5MB (207093760), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0, Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted/Run=NaN 2010-03-10 20:45:04,058 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: Person,,1268251509658 2010-03-10 20:45:04,059 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE: Person,,1268251509658 2010-03-10 20:45:05,062 INFO