Re: Throttle replication speed in case of datanode failure

2013-01-17 Thread Jean-Daniel Cryans
Since this is a Hadoop question, it should be sent
user@hadoop.apache.org (which I'm now sending this to and I put
user@hbase in BCC).

J-D

On Thu, Jan 17, 2013 at 9:54 AM, Brennon Church bren...@getjar.com wrote:
 Hello,

 Is there a way to throttle the speed at which under-replicated blocks are
 copied across a cluster?  Either limiting the bandwidth or the number of
 blocks per time period would work.

 I'm currently running Hadoop v1.0.1.  I think the
 dfs.namenode.replication.work.multiplier.per.iteration option would do the
 trick, but that is in v1.1.0 and higher.

 Thanks.

 --Brennon


Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Jean-Daniel Cryans
Fifth offense.

Yuan Jin is out of the office. - I will be out of the office starting
06/22/2012 and will not return until 06/25/2012. I am out of
Jun 21

Yuan Jin is out of the office. - I will be out of the office starting
04/13/2012 and will not return until 04/16/2012. I am out of
Apr 12

Yuan Jin is out of the office. - I will be out of the office starting
04/02/2012 and will not return until 04/05/2012. I am out of
Apr 2

Yuan Jin is out of the office. - I will be out of the office starting
02/17/2012 and will not return until 02/20/2012. I am out of
Feb 16


On Mon, Jul 23, 2012 at 1:09 PM, Yuan Jin jiny...@cn.ibm.com wrote:


 I am out of the office until 07/25/2012.

 I am out of office.

 For HAMSTER related things, you can contact Jason(Deng Peng Zhou/China/IBM)
 For CFM related things, you can contact Daniel(Liang SH Su/China/Contr/IBM)
 For TMB related things, you can contact Flora(Jun Ying Li/China/IBM)
 For TWB related things, you can contact Kim(Yuan SH Jin/China/IBM)
 For others, I will reply you when I am back.


 Note: This is an automated response to your message  Reducer
 MapFileOutpuFormat sent on 24/07/2012 4:09:51.

 This is the only notification you will receive while this person is away.


Re: Hbase DeleteAll is not working

2012-05-14 Thread Jean-Daniel Cryans
Please don't cross-post, your question is about HBase not MapReduce
itself so I put mapreduce-user@ in BCC.

0.20.3 is, relatively to the age of the project, as old as my
grand-mother so you should consider upgrading to 0.90 or 0.92 which
are both pretty stable.

I'm curious about the shell's behavior you are encountering. Would it
be possible for you to show us the exact trace of what you are doing
in the shell?

To be clear, here's what I'd like to see:

- A get of the row you want to delete. Feel free to zero out the values.
- A deleteall of that row.
- Another get of that row.
- A delete of a column (that should work according to your email).
- A last get of that row.

Thx,

J-D

On Sun, May 13, 2012 at 9:57 PM, Mahesh Balija
balijamahesh@gmail.com wrote:
 Hi,

          I am trying to delete the whole row from hbase in my production
 cluster in two ways,
            1) I have written a mapreduce program to remove many rows which
 satisfy certain condition to do that,
                 The key is the hbase row key only, and the value is
 Delete, I am initializing the delete object with the Key.
                  Delete delete = new Delete(key.get());
                    context.write(key, delete);
             2) From the command line I am trying to delete the selected
 record using deleteall command,

            Both of the are not working, i.e., none of the records are
 being deleted from the hbase, but if I separately delete the independent
 columns thru command line then the record is being deleted if I remove all
 the individual columns. My hbase version is, hbase-0.20.3 and my hadoop
 version is 0.20.2

          Please suggest me whether I am doing anything wrong or is this
 know weird behavior of the hbase?

 Thanks,
 Mahesh.B.


Re: Doubt from the book Definitive Guide

2012-04-05 Thread Jean-Daniel Cryans
On Thu, Apr 5, 2012 at 7:03 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 Only advantage I was thinking of was that in some cases reducers might be
 able to take advantage of data locality and avoid multiple HTTP calls, no?
 Data is anyways written, so last merged file could go on HDFS instead of
 local disk.
 I am new to hadoop so just asking question to understand the rational
 behind using local disk for final output.

So basically it's a tradeoff here, you get more replicas to copy from
but you have 2 more copies to write. Considering that that data's very
short lived and that it doesn't need to be replicated (since if the
machine fails the maps are replayed anyway) it seems that writing 2
replicas that are potentially unused would be hurtful.

Regarding locality, it might make sense on a small cluster but the
more you add nodes the smaller the chance to have local replicas for
each blocks of data you're looking for.

J-D


Re: Fairscheduler - disable default pool

2012-03-13 Thread Jean-Daniel Cryans
We do it here by setting this:

poolMaxJobsDefault0/poolMaxJobsDefault

So that you _must_ have a pool (that's configured with a different
maxRunningJobs) in order to run jobs.

Hope this helps,

J-D

On Tue, Mar 13, 2012 at 10:49 AM, Merto Mertek masmer...@gmail.com wrote:
 I know that by design all unmarked jobs goes to that pool, however I am
 doing some testing and I am interested if is possible to disable it..

 Thanks


Re: Regarding Parrallel Iron's claim

2011-12-08 Thread Jean-Daniel Cryans
Isn't that old news?

http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/

Googling around, doesn't seem anything happened after that.

J-D

On Thu, Dec 8, 2011 at 6:52 PM, JS Jang jsja...@gmail.com wrote:
 Hi,

 Does anyone know any discussion in Apache Hadoop regarding the claim by
 Parrallel Iron with their patent against use of HDFS?
 Thanks in advance.

 Regards,
 JS




Re: Regarding Parrallel Iron's claim

2011-12-08 Thread Jean-Daniel Cryans
You could just look at the archives:
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/

It is also indexed by all search engines.

J-D

On Thu, Dec 8, 2011 at 7:44 PM, JS Jang jsja...@gmail.com wrote:
 I appreciate your help, J-D.
 Yes, I wondered whether there was any update since or previous discussion
 within Apache Hadoop as I am new in this mailing list.


 On 12/9/11 12:19 PM, Jean-Daniel Cryans wrote:

 Isn't that old news?

 http://www.dbms2.com/2011/06/10/patent-nonsense-parallel-ironhdfs-edition/

 Googling around, doesn't seem anything happened after that.

 J-D

 On Thu, Dec 8, 2011 at 6:52 PM, JS Jangjsja...@gmail.com  wrote:

 Hi,

 Does anyone know any discussion in Apache Hadoop regarding the claim by
 Parrallel Iron with their patent against use of HDFS?
 Thanks in advance.

 Regards,
 JS




 --
 
 장정식 / jsj...@gruter.com
 (주)그루터, RD팀 수석
 www.gruter.com
 Cloud, Search and Social
 



Re: Hadoop 0.21

2011-12-06 Thread Jean-Daniel Cryans
Yep.

J-D

On Tue, Dec 6, 2011 at 10:41 AM, Saurabh Sehgal saurabh@gmail.com wrote:
 Hi All,

 According to the Hadoop release notes, version 0.21.0 should not be
 considered stable or suitable for production:

 23 August, 2010: release 0.21.0 available
 This release contains many improvements, new features, bug fixes and
 optimizations. It has not undergone testing at scale and should not be
 considered stable or suitable for production. This release is being
 classified as a minor release, which means that it should be API
 compatible with 0.20.2.


 Is this still the case ?

 Thank you,

 Saurabh


Re: Version of Hadoop That Will Work With HBase?

2011-12-06 Thread Jean-Daniel Cryans
For the record, this thread was started from another discussion in
user@hbase. 0.20.205 does work with HBase 0.90.4, I think the OP was a
little too quick saying it doesn't.

J-D

On Tue, Dec 6, 2011 at 11:44 AM,  jcfol...@pureperfect.com wrote:

 Sadly, CDH3 is not an option although I wish it was. I need to get an
 official release of HBase from apache to work.

 I've tried every version of HBase 0.89 and up with 0.20.205 and all of
 them throw EOFExceptions. Which version of Hadoop core should I be
 using? HBase 0.94 ships with a 20-append version which doesn't work
 throws an EOFException, but when I tried replacing it with the
 hadoop-core included with hadoop 0.20.205 I still get the same
 exception.

 Thanks


   Original Message 
  Subject: Re: Version of Hadoop That Will Work With HBase?
  From: Harsh J ha...@cloudera.com
  Date: Tue, December 06, 2011 2:32 pm
  To: common-user@hadoop.apache.org

  0.20.205 should work, and so should CDH3 or 0.20-append branch builds
  (no longer maintained, after 0.20.205 replaced it though).

  What problem are you facing? Have you ensured HBase does not have a
  bad hadoop version jar in its lib/?

  On Wed, Dec 7, 2011 at 12:55 AM, jcfol...@pureperfect.com wrote:
  
  
   Hi,
  
  
   Can someone please tell me which versions of hadoop contain the
   20-appender code and will work with HBase? According to the Hbase
 docs
   (http://hbase.apache.org/book/hadoop.html), Hadoop 0.20.205 should
 work
   with HBase but it does not appear to.
  
  
   Thanks!
  



  --
  Harsh J



Re: Adjusting column value size.

2011-10-06 Thread Jean-Daniel Cryans
(BCC'd common-user@ since this seems strictly HBase related)

Interesting question... And you probably need all those ints at the same
time right? No streaming? I'll assume no.

So the second solution seems better due to the overhead of storing each
cell. Basically, storing one int per cell you would end up storing more keys
than values (size wise).

Another thing is that if you pack enough ints together and there's some sort
of repetition, you might be able to use LZO compression on that table.

I'd love to hear about your experimentations once you've done them.

J-D

On Mon, Oct 3, 2011 at 10:58 PM, edward choi mp2...@gmail.com wrote:

 Hi,

 I have a question regarding the performance and column value size.
 I need to store per row several million integers. (Several million is
 important here)
 I was wondering which method would be more beneficial performance wise.

 1) Store each integer to a single column so that when a row is called,
 several million columns will also be called. And the user would map each
 column values to some kind of container (ex: vector, arrayList)
 2) Store, for example, a thousand integers into a single column (by
 concatenating them) so that when a row is called, only several thousand
 columns will be called along. The user would have to split the column value
 into 4 bytes and map the split integer to some kind of container (ex:
 vector, arrayList)

 I am curious which approach would be better. 1) would call several millions
 of columns but no additional process is needed. 2) would call only several
 thousands of columns but additional process is needed.
 Any advice would be appreciated.

 Ed



Re: Using HBase for real time transaction

2011-09-21 Thread Jean-Daniel Cryans
On Wed, Sep 21, 2011 at 8:36 AM, Jignesh Patel jign...@websoft.com wrote:
  I am not looking for relational database. But looking creating multi tenant 
 database, now at this time I am not sure whether it needs transactions or not 
 and even that kind of architecture can support transactions.

Currently in HBase nothing prevents you from having multiple tenants,
as long as they have different table names. Also keep in mind that
there's no security implemented, but it *might* make it for 0.92
(crossing fingers).

 Row mutations in HBase are seen by the user as soon as they are done,
 atomicity is guaranteed at the row level, which seems to satisfy his
 requirement. If multi-row transactions are needed then I agree HBase
 might not be what he wants.

 Can't we handle transaction through application or container, before data 
 even goes to HBase?

Sure, you could do something like what Megastore[1] does, but you
really need to evaluate your needs and see if that works.


 And I do have one more doubt, how to handle low read latency?


HBase offers that out of the box, a more precise question would be
what 99th percentile read latency you need. Just for the sake of
giving a data point, right now our 99p is 20ms but that's with our
type of workload, machines, front end caching, etc, so YYMV.

J-D

1. Megastore (transactions are described in chapter 3.3):
http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf


Re: Using HBase for real time transaction

2011-09-20 Thread Jean-Daniel Cryans
While HBase isn't ACID-compliant, it does have have some guarantees:

http://hbase.apache.org/acid-semantics.html

J-D

On Tue, Sep 20, 2011 at 2:56 PM, Michael Segel
michael_se...@hotmail.com wrote:

 Since Tom isn't technical... ;-)

 The short answer is No.
 HBase is not capable of being a transactional because it doesn't support 
 transactions.
 Nor is HBase ACID compliant.

 Having said that, yes you can use HBase to serve data in real time.

 HTH

 -Mike


 Subject: Re: Using HBase for real time transaction
 From: jign...@websoft.com
 Date: Tue, 20 Sep 2011 17:25:17 -0400
 To: common-user@hadoop.apache.org

 Tom,
 Let me reword: can HBase be used as a transactional database(i.e. in 
 replacement of mysql)?

 The requirement is to have real time read and write operations. I mean as 
 soon as data is written the user should see the data(Here data should be 
 written in Hbase).

 -Jignesh


 On Sep 20, 2011, at 5:11 PM, Tom Deutsch wrote:

  Real-time means different things to different people. Can you share your
  latency requirements from the time the data is generated to when it needs
  to be consumed, or how you are thinking of using Hbase in the overall
  flow?
 
  
  Tom Deutsch
  Program Director
  CTO Office: Information Management
  Hadoop Product Manager / Customer Exec
  IBM
  3565 Harbor Blvd
  Costa Mesa, CA 92626-1420
  tdeut...@us.ibm.com
 
 
 
 
  Jignesh Patel jign...@websoft.com
  09/20/2011 12:57 PM
  Please respond to
  common-user@hadoop.apache.org
 
 
  To
  common-user@hadoop.apache.org
  cc
 
  Subject
  Using HBase for real time transaction
 
 
 
 
 
 
  We are exploring possibility of using HBase for the real time
  transactions. Is that possible?
 
  -Jignesh
 




Re: Using HBase for real time transaction

2011-09-20 Thread Jean-Daniel Cryans
 I think there has to be some clarification.

 The OP was asking about a mySQL replacement.
 HBase will never be a RDBMS replacement.  No Transactions means no way of 
 doing OLTP.
 Its the wrong tool for that type of work.

Agreed, if you are looking to handle relational data in a relational
fashion, might be better to look elsewhere

 Recognize what HBase is and what it is not.

Not sure what you're referring to here.

 This doesn't mean you can't take in or deliver data in real time, it can.
 So if you want to use it in a real time manner, sure. Note that like with 
 other databases, you will have to do some work to handle real time data.
 I guess you would have to provide a specific use case on what you want to 
 achieve in order to know if its a good fit.

He says:

 The requirement is to have real time read and write operations. I mean as 
 soon as data is written the user should see the data(Here data should be 
 written in Hbase).

Row mutations in HBase are seen by the user as soon as they are done,
atomicity is guaranteed at the row level, which seems to satisfy his
requirement. If multi-row transactions are needed then I agree HBase
might not be what he wants.

J-D


Re: Regarding design of HDFS

2011-08-25 Thread Jean-Daniel Cryans
In order to have an answer to that sort of question, you first must
prove that you did your own homework eg write down what you think the
answer is based on your observations and readings, then I'm sure
someone will be happy to help you.

J-D

On Thu, Aug 25, 2011 at 1:04 AM, Sesha Kumar sesha...@gmail.com wrote:
 Hi all,
 I am trying to get a good understanding of how Hadoop works, for my
 undergraduate project. I have the following questions/doubts :
 1. Why does namenode store the blockmap (block to datanode mapping) in the
 main memory for all the files, even those that are not used?
 2. Why cant namenode move out a part of the blockmap from main memory to a
 secondary storage device, when free space in main memory becomes scarce (
 due to large number of files) ?
 3. Why cant the blockmap be constructed when a file is requested (by a
 client) and then be cached for later accesses?


Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?

2011-05-17 Thread Jean-Daniel Cryans
Hey Tim,

It looks like you are running with only 1 replica so my first guess is
that you only have 1 datanode and it's writing to /tmp, which was
cleaned at some point.

J-D

On Tue, May 17, 2011 at 5:13 PM, Time Less timelessn...@gmail.com wrote:
 I loaded data into HDFS last week, and this morning I was greeted with this
 on the web interface: WARNING : There are about 32 missing blocks. Please
 check the log or run fsck.

 I ran fsck and see several missing and corrupt blocks. The output is
 verbose, so here's a small sample:

 /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
 CORRUPT block blk_-5745991833770623132
 /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
 MISSING 1 blocks of total size 2945889 B
 /user/hive/warehouse/player_game_stat/2011-01-15/datafile: CORRUPT block
 blk_1642129438978395720
 /user/hive/warehouse/player_game_stat/2011-01-15/datafile: MISSING 1 blocks
 of total size 67108864 B

 Sometimes the number of dots after the B is quite large (several lines
 long). Some of these are tmp files, but many are important. If this cluster
 were prod, I'd have some splaining to do. I need to determine what caused
 this corruption.

 Questions:

 What are the dots after the B? What is the significance of the number of
 them?
 Does anyone have suggestions where to start?
 Are there typical misconfigurations or issues that cause corruption 
 missing files?
 What is the log that the NameNode web interface is refers to?

 Thanks for any infos! I'm... nervous. :)
 --
 Tim Ellis
 Riot Games




Re: distcp problems going from hadoop-0.20.1 to -0.20.2

2011-04-23 Thread Jean-Daniel Cryans
Errr really? Well shipping a bunch of hard drives should be faster.

J-D
On Apr 23, 2011 12:17 AM, Jonathan Disher jdis...@parad.net wrote:
 Aha, that works.

 Any ideas what kind of throughput I can expect, or suggestions for making
this run as fast as possible? Obviously exact numbers will depend on cluster
config, I won't bore you with the details, but... 10mbit? 100mbit? A
gigabit? I've got about 112TB of data to move from the East coast to the
West coast, and sooner would be better than later :)

 -j

 On Apr 22, 2011, at 10:38 PM, Jean-Daniel Cryans wrote:

 See Copying between versions of HDFS:
 http://hadoop.apache.org/common/docs/r0.20.2/distcp.html#cpver

 J-D

 On Fri, Apr 22, 2011 at 10:37 PM, Jonathan Disher jdis...@parad.net
wrote:
 I have an existing cluster running hadoop-0.20.1, and I am migrating
most of the data to a new cluster running -0.20.2. I am seeing this in the
namenode logs when I try to run a distcp:

 @40004db263bf29c77134 WARN ipc.Server: Incorrect header or version
mismatch from newNN:46111 got version 4 expected version 3
 2011-04-23 05:30:55,999 WARN org.apache.hadoop.ipc.Server: Incorrect
header or version mismatch from oldNN:48750 got version 3 expected version 4

 When I run my distcp, on either side, it dies with a
java.io.IOException/java.io.EOFException.

 Ideas? Am I screwed? I really don't want to drop my new cluster down to
0.20.1.

 -j



Re: HDFS + ZooKeeper

2011-04-22 Thread Jean-Daniel Cryans
This is a 1M$ question. You could start thinking about this problem by
looking at what AvatarNode does:
https://issues.apache.org/jira/browse/HDFS-976

J-D

On Fri, Apr 22, 2011 at 10:17 PM, Ozcan ILIKHAN ilik...@cs.wisc.edu wrote:
 Hi,
 Does anyone have any idea about how we can use HDFS with ZooKeeper? More
 elaborately if NameNode fails DataNodes should be able to retrieve address
 of new NameNode from ZooKeeper.

 Thanks,
 -
 Ozcan ILIKHAN
 PhD Student, Graduate Research Assistant
 Department of Computer Sciences
 University of Wisconsin-Madison
 http://pages.cs.wisc.edu/~ilikhan



Re: Hadoop in Canada

2011-03-29 Thread Jean-Daniel Cryans
(moving to general@ since this is not a question regarding the usage
of the hadoop commons, which I BCC'd)

I moved from Montreal to SF a year and a half ago because I saw two
things 1) companies weren't interested (they are still trying to get
rid of COBOL or worse) or didn't have the data to use Hadoop (not
enough big companies) and 2) the universities were either uninterested
or just amused by this new comer. I know of one company that really
does cool stuff with Hadoop in Montreal and it's Hopper
(www.hopper.travel, they are still in closed alpha AFAIK) who also
organized hackreduce.org last weekend. This is what their CEO has to
say to the question Is there something you would do differently now
if you would start it over?:

Move to the Valley.

(see the rest here
http://nextmontreal.com/product-market-fit-hopper-travel-fred-lalonde/)

I'm sure there are a lot of other companies that are either
considering using or already using Hadoop to some extent in Canada
but, like anything else, only a portion of them are interested in
talking about it or even organizing an event.

I would actually love to see something getting organized and I'd be on
the first plane to Y**, but I'm afraid that to achieve any sort of
critical mass you'd have to fly in people from all the provinces. Air
Canada becomes a SPOF :P

Now that I think about it, there's probably enough Canucks around here
that use Hadoop that we could have our own little user group. If you
want to have a nice vacation and geek out with us, feel free to stop
by and say hi.

/rant

J-D

On Tue, Mar 29, 2011 at 6:21 AM, James Seigel ja...@tynt.com wrote:
 Hello,

 You might remember me from a couple of weeks back asking if there were any 
 Calgary people interested in a “meetup” about #bigdata or using hadoop.  
 Well, I’ve expanded my search a little to see if any of my Canadian brothers 
 and sisters are using the elephant for good or for evil.  It might be harder 
 to grab coffee, but it would be fun to see where everyone is.

 Shout out if you’d like or ping me, I think it’d be fun to chat!

 Cheers
 James Seigel
 Captain Hammer at Tynt.com


Re: google snappy

2011-03-23 Thread Jean-Daniel Cryans
(Please don't cross-post like that, it only adds confusion. I put
everything in bcc and posted to general instead)

Their README says the following:

Snappy usually is faster than algorithms in the same class (e.g. LZO,
LZF, FastLZ, QuickLZ, etc.) while achieving comparable compression
ratios.

Somebody obviously needs to publish some benchmarks, but knowing
Snappy's origin I can believe that claim.

Relevant jiras:

HADOOP-7206 Integrate Snappy compression
HBASE-3691   Add compressor support for 'snappy', google's compressor

J-D

On Wed, Mar 23, 2011 at 9:52 AM, Weishung Chung weish...@gmail.com wrote:
 Hey my fellow hadoop/hbase developers,

 I just came across this google compression/decompression package yesterday,
 could we make a good use of this compression scheme in hadoop? It's written
 in C++ though.

 http://code.google.com/p/snappy/

 http://code.google.com/p/snappy/I haven't looked close into this snappy
 package yet but i would love to know about the differences compared to LZO.

 Thank you,
 Wei Shung



Re: mapreduce streaming with hbase as a source

2011-02-22 Thread Jean-Daniel Cryans
(moving to the hbase user ML)

I think streaming used to work correctly in hbase 0.19 since the
RowResult class was giving the value (which you had to parse out), but
now that Result is made of KeyValue and they don't include the values
in toString then I don't see how TableInputFormat could be used. You
could write your own InputFormat that wraps around TIF that returns a
specific format for each cell tho.

Hope that somehow helps,

J-D

2011/2/19 Ondrej Holecek ond...@holecek.eu:
 I don't think you understand me correctly,

 I get this line:

 72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
 row1/family1:b/1298037744658/Put/vlen=1, 
 row1/family1:c/1298037748020/Put/vlen=1}

 I know 72 6f 77 31 is the key and the rest is value, let's call it
 mapreduce-value. In this mapreduce-value there is
 row1/family1:a/1298037737154/Put/vlen=1 that is hbase-row name, hbase-column
 name and hbase-timestamp.  But I expect also hbase-value.

 So my question is what to do to make TableInputFormat to send also this 
 hbase-value.


 Ondrej


 On 02/19/11 16:41, ShengChang Gu wrote:
 By default, the prefix of a line
 up to the first tab character is the key and the rest of the line
 (excluding the tab character)
 will be the value. If there is no tab character in the line, then entire
 line is considered as key
 and the value is null. However, this can be customized, Use:

 -D stream.map.output.field.separator=.
 -D stream.num.map.output.key.fields=4

 2011/2/19 Ondrej Holecek ond...@holecek.eu mailto:ond...@holecek.eu

 Thank you, I've spend a lot of time with debuging but didn't notice
 this typo :(

 Now it works, but I don't understand one thing: On stdin I get this:

 72 6f 77 31 keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
 row1/family1:b/1298037744658/Put/vlen=1,
 row1/family1:c/1298037748020/Put/vlen=1}
 72 6f 77 32 keyvalues={row2/family1:a/1298037755440/Put/vlen=2,
 row2/family1:b/1298037758241/Put/vlen=2,
 row2/family1:c/1298037761198/Put/vlen=2}
 72 6f 77 33 keyvalues={row3/family1:a/1298037767127/Put/vlen=3,
 row3/family1:b/1298037770111/Put/vlen=3,
 row3/family1:c/1298037774954/Put/vlen=3}

 I see there is everything but value. What should I do to get value
 on stdin too?

 Ondrej

 On 02/18/11 20:01, Jean-Daniel Cryans wrote:
  You have a typo, it's hbase.mapred.tablecolumns not
 hbase.mapred.tablecolumn
 
  J-D
 
  On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek ond...@holecek.eu
 mailto:ond...@holecek.eu wrote:
  Hello,
 
  I'm testing hadoop and hbase, I can run mapreduce streaming or
 pipes jobs agains text files on
  hadoop, but I have a problem when I try to run the same job
 against hbase table.
 
  The table looks like this:
  hbase(main):015:0 scan 'table1'
  ROWCOLUMN+CELL
 
   row1
  column=family1:a, timestamp=1298037737154,
  value=1
 
   row1
  column=family1:b, timestamp=1298037744658,
  value=2
 
   row1
  column=family1:c, timestamp=1298037748020,
  value=3
 
   row2
  column=family1:a, timestamp=1298037755440,
  value=11
 
   row2
  column=family1:b, timestamp=1298037758241,
  value=22
 
   row2
  column=family1:c, timestamp=1298037761198,
  value=33
 
   row3
  column=family1:a, timestamp=1298037767127,
  value=111
 
   row3
  column=family1:b, timestamp=1298037770111,
  value=222
 
   row3
  column=family1:c, timestamp=1298037774954,
  value=333
 
  3 row(s) in 0.0240 seconds
 
 
  And command I use, with the exception I get:
 
  # hadoop jar
 /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
  hbase.mapred.tablecolumn=family1:  -input table1 -output
 /mtestout45 -mapper test-map
  -numReduceTasks 1 -reducer test-reduce -inputformat
 org.apache.hadoop.hbase.mapred.TableInputFormat
 
  packageJobJar:
 [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
  /tmp/streamjob8218197708173702571.jar tmpDir=null
  11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
 
 
 hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
 
 http://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
  Exception in thread main java.lang.RuntimeException: Error in
 configuring object
 at
 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at
 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117

Re: mapreduce streaming with hbase as a source

2011-02-18 Thread Jean-Daniel Cryans
You have a typo, it's hbase.mapred.tablecolumns not hbase.mapred.tablecolumn

J-D

On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek ond...@holecek.eu wrote:
 Hello,

 I'm testing hadoop and hbase, I can run mapreduce streaming or pipes jobs 
 agains text files on
 hadoop, but I have a problem when I try to run the same job against hbase 
 table.

 The table looks like this:
 hbase(main):015:0 scan 'table1'
 ROW                                                COLUMN+CELL

  row1                                              column=family1:a, 
 timestamp=1298037737154,
 value=1

  row1                                              column=family1:b, 
 timestamp=1298037744658,
 value=2

  row1                                              column=family1:c, 
 timestamp=1298037748020,
 value=3

  row2                                              column=family1:a, 
 timestamp=1298037755440,
 value=11

  row2                                              column=family1:b, 
 timestamp=1298037758241,
 value=22

  row2                                              column=family1:c, 
 timestamp=1298037761198,
 value=33

  row3                                              column=family1:a, 
 timestamp=1298037767127,
 value=111

  row3                                              column=family1:b, 
 timestamp=1298037770111,
 value=222

  row3                                              column=family1:c, 
 timestamp=1298037774954,
 value=333

 3 row(s) in 0.0240 seconds


 And command I use, with the exception I get:

 # hadoop jar 
 /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
 hbase.mapred.tablecolumn=family1:  -input table1 -output /mtestout45 -mapper 
 test-map
 -numReduceTasks 1 -reducer test-reduce -inputformat 
 org.apache.hadoop.hbase.mapred.TableInputFormat

 packageJobJar: [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] 
 []
 /tmp/streamjob8218197708173702571.jar tmpDir=null
 11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
 hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
 Exception in thread main java.lang.RuntimeException: Error in configuring 
 object
        at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
        at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
        at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
        at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
        at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
        at 
 org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
        at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at 
 org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        ... 23 more
 Caused by: java.lang.NullPointerException
        at 
 org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
        ... 28 more


 Can anyone tell me what I am doing wrong?

 Regards,
 Ondrej



Re: HBase crashes when one server goes down

2011-02-14 Thread Jean-Daniel Cryans
Please use the hbase mailing list for HBase-related questions.

Regarding your issue, we'll need more information to help you out.
Haven you checked the logs? If you see exceptions in there, did you
google them trying to figure out what's going on?

Finally, does your setup meet all the requirements?
http://hbase.apache.org/notsoquick.html#requirements

J-D

On Mon, Feb 14, 2011 at 9:49 AM, Rodrigo Barreto rodbarr...@gmail.com wrote:
 Hi,

 We are new with Hadoop, we have just configured a cluster with 3 servers and
 everything is working ok except when one server goes down, the Hadoop / HDFS
 continues working but the HBase stops, the queries does not return results
 until we restart the HBase. The HBase configuration is copied bellow, please
 help us.

 ## HBASE-SITE.XML ###

 configuration
        property
                namehbase.zookeeper.quorum/name
                valuemaster,slave1,slave2/value
                descriptionThe directory shared by region servers.
                /description
        /property
        property
                namehbase.rootdir/name
                valuehdfs://master:54310/hbase/value
        /property
        property
                namehbase.cluster.distributed/name
                valuetrue/value
        /property
        property
                namehbase.master/name
                valuemaster:6/value
                descriptionThe host and port that the HBase master runs
 at.
                /description
        /property

        property
                namedfs.replication/name
                value2/value
                descriptionDefault block replication.
                The actual number of replications can be specified when the
 file is created.
                The default is used if replication is not specified in
 create time.
                /description
        /property
 /configuration


 Thanks,

 Rodrigo Barreto.



Re: User History Location

2011-02-11 Thread Jean-Daniel Cryans
For cloudera-related questions, please use their mailing lists.

J-D

2011/2/11 Alexander Schätzle schae...@informatik.uni-freiburg.de:
 Hello,

 I'm a little bit confused about the right key for specifying the User
 History Location in CDH3B3 (which is Hadoop 0.20.2+737). Could anybody
 please give me a short answer which key is the right one and which
 configuration file is the right one to place the key?

 1) mapreduce.job.userhistorylocation ?
 2) hadoop.job.history.user.location ?

 Is the mapred-site.xml the right config-file for this key?

 Thx a lot!

 Best regards,

 Alexander Schätzle
 University of Freiburg, Germany



Re: Single Job to put Data into Hbase+MySQL

2010-10-27 Thread Jean-Daniel Cryans
Do both insertions in your reducer by either not using the output
formats at all or use one of them and do the other insert by hand.

J-D

On Wed, Oct 27, 2010 at 1:44 PM, Shuja Rehman shujamug...@gmail.com wrote:
 Hi Folks

 I am wondering if anyone has the answer of this question. I am processing
 log files using Map reduce and get data to put some part into mysql and rest
 of hbase. At the moment, i am running two separate jobs to do this so
 reading single file for 2 times to dump the data. My questions is that can
 it be possible that I run single job to achieve it??

 --
 Regards
 Shuja-ur-Rehman Baig
 http://pk.linkedin.com/in/shujamughal
 Cell: +92 3214207445



Client hanging 20 seconds after job's over (WAS: Re: Can I run HBase 0.20.6 on Hadoop 0.21?)

2010-09-27 Thread Jean-Daniel Cryans
(adding mapreduce-user@ and re-scoping title)

Can you jstack the client while it's waiting 20 seconds? Is it still
waiting for the job to come back or it's something else? Is the job
itself done cleaning 20 seconds before the call returns on the client
side (check the web ui)?

J-D

On Mon, Sep 27, 2010 at 12:10 PM, Pete Tyler peteralanty...@gmail.com wrote:
 Thanks for the offer, much appreciated I have a very simple mapreduce job on 
 a pseudo distributed system. I have a very small amount of persisted data.

 Running locally the mapreduce job runs very quickly, less than three seconds.

 When I run the job against the pseudo distributed hadoop, still on the same 
 machine, as the client then I see the following,
 - the map and reduce classes run very quickly, a matter of mills in total ... 
 sweet
 - the client, blocks waiting for the job to finish for about 20 seconds ... 
 very slow

 I'm trying to understand why I have this 20 second overhead and what I can do 
 about it.

 My map and reduce classes are in my Hadoop classpath.

 On Sep 27, 2010, at 11:32 AM, Jean-Daniel Cryans jdcry...@apache.org wrote:

 Using 0.21.0 may reveal newer bugs rather than fixing your older ones.
 Maybe we can help you debugging 0.20.2, what are you seeing?

 J-D



Re: State of high availability in Hadoop 0.20.1

2010-06-24 Thread Jean-Daniel Cryans
It's the same.

J-D

On Thu, Jun 24, 2010 at 9:44 AM, Stas Oskin stas.os...@gmail.com wrote:
 Just to clarify, I mean the NameNode high availability.

 Regards.

 On Thu, Jun 24, 2010 at 7:43 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 What is the state of high-availability in Hadoop 0.20.1?

 In Hadoop 0.18.3 the only option was doing DBRD, has anything changed in
 0.20.1?

 Regards.




Re: State of high availability in Hadoop 0.20.1

2010-06-24 Thread Jean-Daniel Cryans
The Backup Namenode will be in 0.21 but it's not a complete NN HA
solution (far from that):

https://issues.apache.org/jira/browse/HADOOP-4539

Dhruba at Facebook has a AvatarNode for 0.20:

https://issues.apache.org/jira/browse/HDFS-976

And the umbrella issue for NN availability is:

https://issues.apache.org/jira/browse/HDFS-1064

J-D

On Thu, Jun 24, 2010 at 10:10 AM, Stas Oskin stas.os...@gmail.com wrote:
 Hi.

 The check-point node is expected to be included in 0.21?

 Regards.

 On Thu, Jun 24, 2010 at 7:47 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 It's the same.

 J-D

 On Thu, Jun 24, 2010 at 9:44 AM, Stas Oskin stas.os...@gmail.com wrote:
  Just to clarify, I mean the NameNode high availability.
 
  Regards.
 
  On Thu, Jun 24, 2010 at 7:43 PM, Stas Oskin stas.os...@gmail.com
 wrote:
 
  Hi.
 
  What is the state of high-availability in Hadoop 0.20.1?
 
  In Hadoop 0.18.3 the only option was doing DBRD, has anything changed in
  0.20.1?
 
  Regards.
 
 




Re: Error opening job jar

2010-06-15 Thread Jean-Daniel Cryans
This isn't a HBase question, this is for mapreduce-user@hadoop.apache.org

J-D

On Tue, Jun 15, 2010 at 8:21 AM, yshintre1982 yshintre1...@yahoo.in wrote:

 i am running wordcount example on linux vmware on hadoop.
 i get the following exception

 Exception in thread main java.io.IOException: Error opening job jar:
 /usr/yogesh/wordcount.jar
        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
 Caused by: java.util.zip.ZipException: error in opening zip file
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.init(Unknown Source)
        at java.util.jar.JarFile.init(Unknown Source)
        at java.util.jar.JarFile.init(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

 what would be wrong,
 plz help...


 --
 View this message in context: 
 http://old.nabble.com/Error-opening-job-jar-tp28892690p28892690.html
 Sent from the HBase User mailing list archive at Nabble.com.




Re: HBase client hangs after upgrade to 0.20.4 when used from reducer

2010-05-14 Thread Jean-Daniel Cryans
 info to
 pastebin.
 
  I did the following sequence (with HBase 0.20.4):
  - startup HBase (waited for all the regions to come online and let it
  settle)
  - startup our application
  - wait for the importer job to hang (it only happened on the second run,
  which started 15 reducers; the first run was really small and only one
 key
  was generated, so just one reducer)
  - kill the hanging importer job (hadoop job -kill)
  - try to shutdown HBase (as I type this it is still producing dots on my
  console)
 
  The HBase master logs are here (includes shutdown attempt):
  http://pastebin.com/PYpPVcyK
  The jstacks are here:
  - HMaster: http://pastebin.com/Da6jCAuA (this includes two thread
 dumps,
  one during operation with the hanging clients and one during hanging
  shutdown)
  - RegionServer 1: http://pastebin.com/5dQXfxCn
  - RegionServer 2: http://pastebin.com/XWwBGXYC
  - RegionServer 3: http://pastebin.com/mDgWbYGV
  - RegionServer 4: http://pastebin.com/XDR14bth
 
  As you can see in the master logs, the shutdown cannot get a thread
 called
  Thread-10 to stop running. The trace for that thread looks like this:
  Thread-10 prio=10 tid=0x4d218800 nid=0x1e73 in Object.wait()
  [0x427a7000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
         at java.lang.Object.wait(Native Method)
         - waiting on 0x2aaab364c9d0 (a java.lang.Object)
        at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:89)
        - locked 0x2aaab364c9d0 (a java.lang.Object)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:76)
 
  I still have no clue what happened, but I will investigate a bit more
  tomorrow.
 
 
  Thanks for the responses.
 
 
  Friso
 
 
 
  On May 12, 2010, at 9:02 PM, Todd Lipcon wrote:
 
  Hi Friso,
 
  Also, if you can capture a jstack of the regionservers at thie time
  that would be great.
 
  -Todd
 
  On Wed, May 12, 2010 at 9:26 AM, Jean-Daniel Cryans 
 jdcry...@apache.org
  wrote:
  Friso,
 
  Unfortunately it's hard to determine the cause with the provided
  information, the client call you pasted is pretty much normal i.e. the
  client is waiting to receive a result from a region server.
 
  The fact that you can't shut down the master when this happens is very
  concerning. Do you still have those logs around? Same for the region
  servers? Can you post this in pastebin or on a web server?
 
  Also, feel free to come chat with us on IRC, it's always easier to
  debug when live. #hbase on freenode
 
  J-D
 
  On Wed, May 12, 2010 at 8:31 AM, Friso van Vollenhoven
  fvanvollenho...@xebia.com wrote:
  Hi all,
 
  I am using Hadoop (0.20.2) and HBase to periodically import data
 (every
  15 minutes). There are a number of import processes, but generally they
 all
  create a sequence file on HDFS, which is then run through a MapReduce
 job.
  The MapReduce uses the identity mapper (the input file is a Hadoop
 sequence
  file) and a specialized reducer that does the following:
  - Combine the values for a key into one value
  - Do a Get from HBase to retrieve existing values for the same key
  - Combine the existing value from HBase and the new one into one
 value
  again
  - Put the final value into HBase under the same key (thus 'overwrite'
  the existing row; I keep only one version)
 
  After I upgraded HBase to the 0.20.4 release, the reducers sometimes
  start hanging on a Get. When the jobs start, some reducers run to
 completion
  fine, but after a while the last reducers will start to hang. Eventually
 the
  reducers are killed of by Hadoop (after 600 secs).
 
  I did a thread dump for one of the hanging reducers. It looks like
  this:
  main prio=10 tid=0x48083800 nid=0x4c93 in Object.wait()
  [0x420ca000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on 0x2eb50d70 (a
  org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at java.lang.Object.wait(Object.java:485)
        at
  org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
        - locked 0x2eb50d70 (a
  org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at
  org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
        at $Proxy2.get(Unknown Source)
        at
 org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:450)
        at
 org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:448)
        at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:447)
        at
 
 net.ripe.inrdb.hbase.accessor.real.HBaseTableAccessor.get(HBaseTableAccessor.java:36)
        at
 
 net.ripe.inrdb.hbase.store.HBaseStoreUpdater.getExistingRecords(HBaseStoreUpdater.java:101)
        at
 
 net.ripe.inrdb.hbase.store.HBaseStoreUpdater.mergeTimelinesWithExistingRecords(HBaseStoreUpdater.java:60

Re: Enabling Indexing in HBase

2010-05-12 Thread Jean-Daniel Cryans
Yes, you can also create a HBaseConfiguration object and configure it
with those exact configs (that you then provide to HTable).

J-D

On Wed, May 12, 2010 at 1:22 AM, Michelan Arendse miche...@addynamo.com wrote:
 Thank you. I have added the configuration folder to my client class path and 
 it worked.

 Now I am faced with another issue, since this application will be used in 
 ColdFusion is there a way of making this work without having the 
 configuration as part of the class path?

 -Original Message-
 From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
 Cryans
 Sent: 11 May 2010 06:26 PM
 To: hbase-user@hadoop.apache.org
 Subject: Re: Enabling Indexing in HBase

 Per 
 http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview
 your client has to know where your zookeeper setup is. Since you want
 to use HBase in a distributed fashion, that means you went through
 http://hadoop.apache.org/hbase/docs/r0.20.4/api/overview-summary.html#fully-distrib
 and this is where the required configs are.

 It could be made more obvious tho.

 J-D

 On Tue, May 11, 2010 at 4:44 AM, Michelan Arendse miche...@addynamo.com 
 wrote:
 Thanks. I have added that to the class path, but I still get an error.
 This is the error that I get:

 10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=localhost:2181 sessionTimeout=6 
 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@12d15a9
 10/05/11 13:41:27 INFO zookeeper.ClientCnxn: Attempting connection to server 
 localhost/127.0.0.1:2181
 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Exception closing session 0x0 
 to sun.nio.ch.selectionkeyi...@b0ce8f
 java.net.ConnectException: Connection refused: no further information
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Ignoring exception during 
 shutdown input

 I'm working of a server and not standalone mode, where would I change a 
 setting that tells the connectString to point to the server instead of 
 localhost.

 -Original Message-
 From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of 
 Jean-Daniel Cryans
 Sent: 10 May 2010 07:05 PM
 To: hbase-user@hadoop.apache.org
 Subject: Re: Enabling Indexing in HBase

 Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in
 your class path?

 J-D

 On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse miche...@addynamo.com 
 wrote:
 Hi.

 I added the following properties  to hbase-site.xml
 property
        namehbase.regionserver.class/name
        valueorg.apache.hadoop.hbase.ipc.IndexedRegionInterface/value
    /property

    property
        namehbase.regionserver.impl/name
        value
        org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
        /value
    /property

 I'm using hbase 0.20.3 and when I start hbase now it comes with the 
 following:
 ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
 java.lang.UnsupportedOperationException: Unable to find region server 
 interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.ipc.IndexedRegionInterface

 Can you please help with this problem that I am having.

 Thank you,

 Michelan Arendse
 Junior Developer | AD:DYNAMO // happy business ;-)
 Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587

 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com 
 http://www.addynamo.com






Re: Enabling Indexing in HBase

2010-05-11 Thread Jean-Daniel Cryans
Per 
http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview
your client has to know where your zookeeper setup is. Since you want
to use HBase in a distributed fashion, that means you went through
http://hadoop.apache.org/hbase/docs/r0.20.4/api/overview-summary.html#fully-distrib
and this is where the required configs are.

It could be made more obvious tho.

J-D

On Tue, May 11, 2010 at 4:44 AM, Michelan Arendse miche...@addynamo.com wrote:
 Thanks. I have added that to the class path, but I still get an error.
 This is the error that I get:

 10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=localhost:2181 sessionTimeout=6 
 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@12d15a9
 10/05/11 13:41:27 INFO zookeeper.ClientCnxn: Attempting connection to server 
 localhost/127.0.0.1:2181
 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Exception closing session 0x0 to 
 sun.nio.ch.selectionkeyi...@b0ce8f
 java.net.ConnectException: Connection refused: no further information
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
 10/05/11 13:41:28 WARN zookeeper.ClientCnxn: Ignoring exception during 
 shutdown input

 I'm working of a server and not standalone mode, where would I change a 
 setting that tells the connectString to point to the server instead of 
 localhost.

 -Original Message-
 From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
 Cryans
 Sent: 10 May 2010 07:05 PM
 To: hbase-user@hadoop.apache.org
 Subject: Re: Enabling Indexing in HBase

 Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in
 your class path?

 J-D

 On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse miche...@addynamo.com 
 wrote:
 Hi.

 I added the following properties  to hbase-site.xml
 property
        namehbase.regionserver.class/name
        valueorg.apache.hadoop.hbase.ipc.IndexedRegionInterface/value
    /property

    property
        namehbase.regionserver.impl/name
        value
        org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
        /value
    /property

 I'm using hbase 0.20.3 and when I start hbase now it comes with the 
 following:
 ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
 java.lang.UnsupportedOperationException: Unable to find region server 
 interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.ipc.IndexedRegionInterface

 Can you please help with this problem that I am having.

 Thank you,

 Michelan Arendse
 Junior Developer | AD:DYNAMO // happy business ;-)
 Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587

 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com 
 http://www.addynamo.com





Re: Deprecated Table Map in Hbase-0.20.3

2010-05-10 Thread Jean-Daniel Cryans
Two things:

First TableMap was using the raw type instead of a generic one, this
was fixed in https://issues.apache.org/jira/browse/HBASE-876

Then it wasn't generic enough, so this was filed
https://issues.apache.org/jira/browse/HBASE-1725

That's the explanation. I remember having the same issue when I
migrated my code to 0.20, but it's nothing you can't resolve, just
inspect the compilation error messages and you'll figure it out.

J-D

On Mon, May 10, 2010 at 3:28 AM, bharath v
bharathvissapragada1...@gmail.com wrote:
 Hey folks ,

 I have a small question regarding TableMap class. I know it is deprecated in
 0.20.3 ,

 But the declaration was changed from

 public interface TableMapK extends WritableComparable, V extends Writable
 extends MapperImmutableBytesWritable, RowResult, K, V

  TO

 public interface TableMapK extends WritableComparable? super K, V extends
 Writable
 extends MapperImmutableBytesWritable, RowResult, K, V {


 Why is there an additional restriction on K  that ? super K  . Because
 of this my app written for 0.19.3 isn't getting compiled now .

 Any suggestions or comments?

 Thanks



Re: Enabling Indexing in HBase

2010-05-10 Thread Jean-Daniel Cryans
Did you include the jar (contrib/indexed/hbase-0.20.3-indexed.jar) in
your class path?

J-D

On Mon, May 10, 2010 at 6:43 AM, Michelan Arendse miche...@addynamo.com wrote:
 Hi.

 I added the following properties  to hbase-site.xml
 property
        namehbase.regionserver.class/name
        valueorg.apache.hadoop.hbase.ipc.IndexedRegionInterface/value
    /property

    property
        namehbase.regionserver.impl/name
        value
        org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
        /value
    /property

 I'm using hbase 0.20.3 and when I start hbase now it comes with the following:
 ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
 java.lang.UnsupportedOperationException: Unable to find region server 
 interface org.apache.hadoop.hbase.ipc.IndexedRegionInterface
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.ipc.IndexedRegionInterface

 Can you please help with this problem that I am having.

 Thank you,

 Michelan Arendse
 Junior Developer | AD:DYNAMO // happy business ;-)
 Office 0861 Dynamo (0861 396266)  | Fax +27 (0) 21 465 2587

 Advertise Online Instantly - www.addynamo.comhttp://www.addynamo.com 
 http://www.addynamo.com




Re: Got some question for begin HBase (KeyValue, data structure)

2010-05-10 Thread Jean-Daniel Cryans
Inline.

J-D

 1. How can i get the key name for KeyValue? I
 use Bytes.toString(KeyValue.getKey()) cannot got any return.

The javadoc of this method says: * Do not use unless you have to.
Used internally for compacting and testing.

The row key is given by
http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Result.html#getRow()

 2. Usually what value are u set for rowid?

UUIDs or composite keys like timestamp+some_other_tags

 3. How are you deploy the data structure from development server to
 production server?

Copy over the DDL used in the shell.


 I think i need some information or document for how to design the data
 structure on HBase. can you share for me?

Google's Bigtable paper is always a good resource. The wiki has some
tips (check the website). Else you can search this mailing list WRT
your specific data model, you'll probably find what you need.


 Thanks  Regards,
 Singo



Re: Searching rows for array of key values

2010-05-04 Thread Jean-Daniel Cryans
If your row keys are sorted in a lexicographical way (padded with
zeroes in your case since it's longs) then simply use a scanner:

http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/Scan.html

Configure it with a start and end row key, configure setCaching to the
number of rows you need and it will do a single RPC to fetch
everything very efficiently. The exact response time depends on your
hardware, caching, and data size.

J-D

On Tue, May 4, 2010 at 3:16 PM, atreju n.atr...@gmail.com wrote:
 Hello,



 I am doing a research on HBase if we can use it efficiently in our company.
 I need to be able get/scan list of rows for an array of key values (sorted,
 long type). The array size will be 1,000 to 10,000. The table will have a
 few hundred million rows. What is the most efficient (fastest) way to get
 the list of rows for the requested row key values?



 Thanks.



Re: hbase.client.retries.number = 1 is bad

2010-05-03 Thread Jean-Daniel Cryans
Trunk is a work in progress and the shell was recently redone. This
configuration was set tentatively by the author of that change but, as
you can see, it doesn't work very well! The jira is here
https://issues.apache.org/jira/browse/HBASE-2352

J-D

On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz mkur...@gmail.com wrote:
 Hi!

 I'm using a fresh version of trunk.
 I'm experiencing a problem where the invalid region locations are not
 removed from the cache of HCM.
 I'm only using scanners on the table and I receive the following errors:

 2010-05-03 23:42:52,574 DEBUG
 org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing
 internal scanner to startKey at
 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg'
 2010-05-03 23:42:52,574 DEBUG
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache
 hit for row http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg
 in tableName Test5: location server 10.1.3.111:60020, location region
 name 
 Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136
 SEVERE: Trying to contact region server 10.1.3.111:60020 for region
 Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136,
 row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg',
 but failed after 1 attempts.
 Exceptions:
 java.net.ConnectException: Connection refused

 Which is expected as the 10.1.3.111:60020 regionserver was offline for
 hours at that time.
 The cause of this problem is that I set hbase.client.retries.number to
 1 as I don't like the current retry options.
 In this case the following code at HConnectionManager.java:1061
   callable.instantiateServer(tries != 0);
 will make scanners to always use the cache.
 This makes hbase.client.retries.number = 1 an unusable option.

 This is not intentional, am I correct?
 Am I forced to use the retries, or is there an other option?

 Also I would like to ask, when is it a good thing to retry an operation?
 In my experience there exists two kinds of failures
 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline
 This can be due to a compaction, in which case we probably need to
 wait for a few seconds.
 Or it can be due to a split, in which case we might need to wait for minutes.
 Either case I would not want my client to wait for such long times
 when I could reschedule other things to do in that time.
 It is also possible that region has been transfered to an other
 regionserver but that is rare compared to the other cases.

 2) java.net.ConnectException : regionserver is offline
 This is solved as soon as the master can reopen regions on an other
 regionserver, but still can take minutes.
 Anyway this exception is also rare(usually)

 Best regards,
 Miklos



Re: hbase.client.retries.number = 1 is bad

2010-05-03 Thread Jean-Daniel Cryans
Yeah I understand that retries are unusable at that level, but you
still want retries in order to be able to recalibrate the .META. cache
right?

So the semantic here is that 1 retry is in fact 1 try, using the
cached information. https://issues.apache.org/jira/browse/HBASE-2445
is about reviewing those semantics in order to offer something more
tangible to the users rather than a mix of number of retries and
timeouts. Feel free to take a look and even a stab at this issue ;)

J-D

On Mon, May 3, 2010 at 3:25 PM, Miklós Kurucz mkur...@gmail.com wrote:
 This problem is not related to the shell.
 I checked 0.20.3 has the same code HConnectionManager.java:1034, I
 expect that to be broken too.

 Miklos

 2010/5/4 Jean-Daniel Cryans jdcry...@apache.org:
 Trunk is a work in progress and the shell was recently redone. This
 configuration was set tentatively by the author of that change but, as
 you can see, it doesn't work very well! The jira is here
 https://issues.apache.org/jira/browse/HBASE-2352

 J-D

 On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz mkur...@gmail.com wrote:
 Hi!

 I'm using a fresh version of trunk.
 I'm experiencing a problem where the invalid region locations are not
 removed from the cache of HCM.
 I'm only using scanners on the table and I receive the following errors:

 2010-05-03 23:42:52,574 DEBUG
 org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing
 internal scanner to startKey at
 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg'
 2010-05-03 23:42:52,574 DEBUG
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache
 hit for row http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg
 in tableName Test5: location server 10.1.3.111:60020, location region
 name 
 Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136
 SEVERE: Trying to contact region server 10.1.3.111:60020 for region
 Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136,
 row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg',
 but failed after 1 attempts.
 Exceptions:
 java.net.ConnectException: Connection refused

 Which is expected as the 10.1.3.111:60020 regionserver was offline for
 hours at that time.
 The cause of this problem is that I set hbase.client.retries.number to
 1 as I don't like the current retry options.
 In this case the following code at HConnectionManager.java:1061
   callable.instantiateServer(tries != 0);
 will make scanners to always use the cache.
 This makes hbase.client.retries.number = 1 an unusable option.

 This is not intentional, am I correct?
 Am I forced to use the retries, or is there an other option?

 Also I would like to ask, when is it a good thing to retry an operation?
 In my experience there exists two kinds of failures
 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline
 This can be due to a compaction, in which case we probably need to
 wait for a few seconds.
 Or it can be due to a split, in which case we might need to wait for 
 minutes.
 Either case I would not want my client to wait for such long times
 when I could reschedule other things to do in that time.
 It is also possible that region has been transfered to an other
 regionserver but that is rare compared to the other cases.

 2) java.net.ConnectException : regionserver is offline
 This is solved as soon as the master can reopen regions on an other
 regionserver, but still can take minutes.
 Anyway this exception is also rare(usually)

 Best regards,
 Miklos





Re: Hbase: GETs are very slow

2010-04-30 Thread Jean-Daniel Cryans
Which version? How much heap was given to HBase?

WRT block caching, I don't see how it could impact uploading in any
way, you should enable it. What was the problem inserting 1B rows
exactly? How were you running the upload?

Are you making sure there's no swap on the machines? That kills java
performance faster than you can say hbase ;)

J-D

On Fri, Apr 30, 2010 at 8:36 AM, Ruben Quintero rfq_...@yahoo.com wrote:
 Hi,

 I have a hadoop/hbase cluster running on 9 machines (only 8 GB RAM, 1 TB 
 drives), and have recently noticed that Gets from Hbase have slowed down 
 significantly. I'd say at this point I'm not getting more than 100/sec when 
 using the Hbase Java API. DFS-wise, there's plenty of space left (using less 
 than 10%), and all of the servers seem okay. The tables use LZO, and have 
 blockcache disabled (we were having problems inserting up to a billion rows 
 with it on, and read in the mailing list somewhere that it might help).

 The primary table has only 4 million rows at the moment. I created a new test 
 table with only 200,000, and it was running 100/sec as well.

 I'm not sure what the problem could be (paging?), or some configuration that 
 can be adjusted?

 Any ideas? I can show our configuration if that's helpful, I just wasn't sure 
 what info would be helpful and what would be extraneous.

 Thanks,

 - Ruben






Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
Yeah more handlers won't do it here since there's tons of calls
waiting on a single synchronized method, I guess the IndexedRegion
should use a pool of HTables instead of a single one in order to
improve indexation throughput.

J-D

On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas c...@email.com wrote:
 Here is the thread dump:

 I cranked up the handlers to 300 just in case and ran 40 mappers that loaded 
 data via thrift. Each node runs its own thrift server. I saw an average of 18 
 rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems 
 no matter how many mappers I throw the total number of rows/sec doesn't go 
 much above 700 rows/second total, which seems very, very slow to me.

 Here is the thread dump from a node:

 http://pastebin.com/U3eLRdMV

 I do see quite a bit of waiting and some blocking in there, not sure how 
 exactly to interpret it all though.

 thanks for any help!
 -chris

 On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote:

 One thing to check is at the peak of your load, run jstack on one of
 the regionservers, and look at the handler threads - if all of them
 are doing something you might be running into handler contention.

 it is basically ultimately IO bound.

 -ryan

 On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas c...@email.com wrote:
 They are all at 100, but none of the regionservers are loaded - most are
 less than 20% CPU. Is this all network latency?

 -chris

 On Apr 29, 2010, at 8:29 PM, Ryan Rawson ryano...@gmail.com wrote:

 Every insert on an indexed would require at the very least an RPC to a
 different regionserver.  If the regionservers are busy, your request
 could wait in the queue for a moment.

 One param to tune would be the handler thread count.  Set it to 100 at
 least.

 On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas c...@email.com wrote:

 I just finished some testing with JDK 1.6 u17 - so far no performance
 improvements with just changing that. Disabling LZO compression did gain a
 little bit (up to about 30/sec from 25/sec per thread). Turning of indexes
 helped the most - that brought me up to 115/sec @ 2875 total rows a 
 second.
 A single perl/thrift process can load at over 350 rows/sec so its not
 scaling as well as I would have expected, even without the indexes.

 Are the transactional indexes that costly? What is the bottleneck there?
 CPU utilization and network packets went up when I disabled the indexes, I
 don't think those are the bottlenecks for the indexes. I was even able to
 add another 15 insert process (total of 40) and only lost about 10% on a 
 per
 process throughput. I probably could go even higher, none of the nodes are
 above CPU 60% utilization and IO wait was at most 3.5%.

 Each rowkey is unique, so there should not be any blocking on the row
 locks. I'll do more indexed tests tomorrow.

 thanks,
 -chris







 On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote:

 Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 and
 you
 should be good to go. _18 is a botched release if I ever saw one.

 -Todd

 On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas c...@email.com wrote:

 Hi Stack,

 Thanks for looking. I checked the ganglia charts, no server was at more
 than ~20% CPU utilization at any time during the load test and swap was
 never used. Network traffic was light - just running a count through
 hbase
 shell generates a much higher use. One the server hosting meta
 specifically,
 it was at about 15-20% CPU, and IO wait never went above 3%, was
 usually
 down at near 0.

 The load also died with a thrift timeout on every single node (each
 node
 connecting to localhost for its thrift server), it looks like a
 datanode
 just died and caused every thrift connection to timeout - I'll have to
 up
 that limit to handle a node death.

 Checking logs this appears in the logs of the region server hosting
 meta,
 looks like the dead datanode causing this error:

 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient:
 DFSOutputStream ResponseProcessor exception  for block
 blk_508630839844593817_11180java.io.IOException: Bad response 1 for
 block
 blk_508630839844593817_11180 from datanode 10.195.150.255:50010
      at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423)

 The regionserver log on teh dead node, 10.195.150.255 has some more
 errors
 in it:

 http://pastebin.com/EFH9jz0w

 I found this in the .out file on the datanode:

 # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
 linux-amd64 )
 # Problematic frame:
 # V  [libjvm.so+0x62263c]
 #
 # An error report file with more information is saved as:
 # /usr/local/hadoop-0.20.1/hs_err_pid1364.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 #


 There is not a single error in the datanode's log though. Also of note
 -
 this happened well into the test, so the node dying cause the load to
 abort
 but not the prior 

Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
The contrib packages doesn't get as much love as core HBase, so they
tend to be under performant and/or reliable and/or maintained and/or
etc. In this case the issue doesn't seem that bad since it could just
use a HTablePool, but using IndexedTables will definitely be slower
than straight insert since it writes to 2 tables (the main table and
the index).

J-D

On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas c...@email.com wrote:
 It appears that for multiple simulations loads using the IndexTables probably 
 not the best choice?

 -chris

 On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote:

 Yeah more handlers won't do it here since there's tons of calls
 waiting on a single synchronized method, I guess the IndexedRegion
 should use a pool of HTables instead of a single one in order to
 improve indexation throughput.

 J-D

 On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas c...@email.com wrote:
 Here is the thread dump:

 I cranked up the handlers to 300 just in case and ran 40 mappers that 
 loaded data via thrift. Each node runs its own thrift server. I saw an 
 average of 18 rows/sec/mapper with no node using more than 10% CPU and no 
 IO wait. It seems no matter how many mappers I throw the total number of 
 rows/sec doesn't go much above 700 rows/second total, which seems very, 
 very slow to me.

 Here is the thread dump from a node:

 http://pastebin.com/U3eLRdMV

 I do see quite a bit of waiting and some blocking in there, not sure how 
 exactly to interpret it all though.

 thanks for any help!
 -chris

 On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote:

 One thing to check is at the peak of your load, run jstack on one of
 the regionservers, and look at the handler threads - if all of them
 are doing something you might be running into handler contention.

 it is basically ultimately IO bound.

 -ryan

 On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas c...@email.com wrote:
 They are all at 100, but none of the regionservers are loaded - most are
 less than 20% CPU. Is this all network latency?

 -chris

 On Apr 29, 2010, at 8:29 PM, Ryan Rawson ryano...@gmail.com wrote:

 Every insert on an indexed would require at the very least an RPC to a
 different regionserver.  If the regionservers are busy, your request
 could wait in the queue for a moment.

 One param to tune would be the handler thread count.  Set it to 100 at
 least.

 On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas c...@email.com wrote:

 I just finished some testing with JDK 1.6 u17 - so far no performance
 improvements with just changing that. Disabling LZO compression did 
 gain a
 little bit (up to about 30/sec from 25/sec per thread). Turning of 
 indexes
 helped the most - that brought me up to 115/sec @ 2875 total rows a 
 second.
 A single perl/thrift process can load at over 350 rows/sec so its not
 scaling as well as I would have expected, even without the indexes.

 Are the transactional indexes that costly? What is the bottleneck there?
 CPU utilization and network packets went up when I disabled the 
 indexes, I
 don't think those are the bottlenecks for the indexes. I was even able 
 to
 add another 15 insert process (total of 40) and only lost about 10% on 
 a per
 process throughput. I probably could go even higher, none of the nodes 
 are
 above CPU 60% utilization and IO wait was at most 3.5%.

 Each rowkey is unique, so there should not be any blocking on the row
 locks. I'll do more indexed tests tomorrow.

 thanks,
 -chris







 On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote:

 Definitely smells like JDK 1.6.0_18. Downgrade that back to 16 or 17 
 and
 you
 should be good to go. _18 is a botched release if I ever saw one.

 -Todd

 On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas c...@email.com wrote:

 Hi Stack,

 Thanks for looking. I checked the ganglia charts, no server was at 
 more
 than ~20% CPU utilization at any time during the load test and swap 
 was
 never used. Network traffic was light - just running a count through
 hbase
 shell generates a much higher use. One the server hosting meta
 specifically,
 it was at about 15-20% CPU, and IO wait never went above 3%, was
 usually
 down at near 0.

 The load also died with a thrift timeout on every single node (each
 node
 connecting to localhost for its thrift server), it looks like a
 datanode
 just died and caused every thrift connection to timeout - I'll have to
 up
 that limit to handle a node death.

 Checking logs this appears in the logs of the region server hosting
 meta,
 looks like the dead datanode causing this error:

 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient:
 DFSOutputStream ResponseProcessor exception  for block
 blk_508630839844593817_11180java.io.IOException: Bad response 1 for
 block
 blk_508630839844593817_11180 from datanode 10.195.150.255:50010
      at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423)

 The regionserver log on teh dead node, 10.195.150.255 has some more

Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas c...@email.com wrote:
 Thank you, it is nice to get this help.

 I definitely understand the overhead of writing the index, although it seems 
 much worse than just that overhead would indicate. If I understand you 
 correctly that is because all inserts into an IndexedTable are synchronized 
 on one table? If that was switched to using an HTablePool it would no longer 
 be as sever a bottleneck (performance is about an order of magnitude better 
 without the indexing)?

They are synchronized per region server yes, and it _should_ be better
with a pool since then you can do parallel inserts. Patching it
doesn't seem hard, but maybe I'm missing some finer details since I
usually don't work around that code.


 I'm also using thrift to connect and am wondering if that itself puts an 
 overall limit on scaling? It does seem that no matter how many more mappers 
 and servers I add, even without indexing, I am capped at about 5k rows/sec 
 total. I'm waiting a bit as the table grows so that it is split across more 
 regionservers, hopefully that will help, but as far as I can tell I am not 
 hitting any CPU or IO constraint during my tests.

I don't understand the I'm also using thrift and how many more
mappers part, you are using Thrift inside a map? Anyways, more
clients won't help since there's a single mega serialization of all
the inserts to the index table per region server. It's normal not to
see any CPU/mem/IO contention since, in this case, it's all about the
speed at which you can process a single row insertion The rest of the
threads just wait...


 -chris

 I'm also using thrift, and while I am using the
 On Apr 30, 2010, at 3:00 PM, Jean-Daniel Cryans wrote:

 The contrib packages doesn't get as much love as core HBase, so they
 tend to be under performant and/or reliable and/or maintained and/or
 etc. In this case the issue doesn't seem that bad since it could just
 use a HTablePool, but using IndexedTables will definitely be slower
 than straight insert since it writes to 2 tables (the main table and
 the index).

 J-D

 On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas c...@email.com wrote:
 It appears that for multiple simulations loads using the IndexTables 
 probably not the best choice?

 -chris

 On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote:

 Yeah more handlers won't do it here since there's tons of calls
 waiting on a single synchronized method, I guess the IndexedRegion
 should use a pool of HTables instead of a single one in order to
 improve indexation throughput.

 J-D

 On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas c...@email.com wrote:
 Here is the thread dump:

 I cranked up the handlers to 300 just in case and ran 40 mappers that 
 loaded data via thrift. Each node runs its own thrift server. I saw an 
 average of 18 rows/sec/mapper with no node using more than 10% CPU and no 
 IO wait. It seems no matter how many mappers I throw the total number of 
 rows/sec doesn't go much above 700 rows/second total, which seems very, 
 very slow to me.

 Here is the thread dump from a node:

 http://pastebin.com/U3eLRdMV

 I do see quite a bit of waiting and some blocking in there, not sure how 
 exactly to interpret it all though.

 thanks for any help!
 -chris

 On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote:

 One thing to check is at the peak of your load, run jstack on one of
 the regionservers, and look at the handler threads - if all of them
 are doing something you might be running into handler contention.

 it is basically ultimately IO bound.

 -ryan

 On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas c...@email.com wrote:
 They are all at 100, but none of the regionservers are loaded - most are
 less than 20% CPU. Is this all network latency?

 -chris

 On Apr 29, 2010, at 8:29 PM, Ryan Rawson ryano...@gmail.com wrote:

 Every insert on an indexed would require at the very least an RPC to a
 different regionserver.  If the regionservers are busy, your request
 could wait in the queue for a moment.

 One param to tune would be the handler thread count.  Set it to 100 at
 least.

 On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas c...@email.com wrote:

 I just finished some testing with JDK 1.6 u17 - so far no performance
 improvements with just changing that. Disabling LZO compression did 
 gain a
 little bit (up to about 30/sec from 25/sec per thread). Turning of 
 indexes
 helped the most - that brought me up to 115/sec @ 2875 total rows a 
 second.
 A single perl/thrift process can load at over 350 rows/sec so its not
 scaling as well as I would have expected, even without the indexes.

 Are the transactional indexes that costly? What is the bottleneck 
 there?
 CPU utilization and network packets went up when I disabled the 
 indexes, I
 don't think those are the bottlenecks for the indexes. I was even 
 able to
 add another 15 insert process (total of 40) and only lost about 10% 
 on a per
 process throughput. I probably could go even

Re: Hbase: GETs are very slow

2010-04-30 Thread Jean-Daniel Cryans
So we chatted a bit on IRC, the reason GETs were slower is because
block caching was disabled and all calls were hitting HDFS. I was
confused by the first email as it seemed that for some time it was
still speedy without caching.

I wanted to look at the import issue, but logs weren't available.

J-D

On Fri, Apr 30, 2010 at 10:44 AM, Ruben Quintero rfq_...@yahoo.com wrote:
 We're running 20.3, and it has a 6 GB heap.

 With block caching on, it seems we were running out of memory.  It would 
 temporarily lose a region server (usually when it attempted to split) and 
 that caused a chain reaction when it attempted to recover.  The heap would 
 start to surge and cause a heavy garbage collection. We would have nodes 
 dropping in and out, and getting overloaded when they rejoined. We found a 
 post in a mailing list that recommended turning off block caching, and it ran 
 well after that.

 As for swap, that was my first guess. How can I make sure it's not swapping, 
 or is there a way to see if it is?

 Thanks,

 - Ruben




 
 From: Jean-Daniel Cryans jdcry...@apache.org
 To: hbase-user@hadoop.apache.org
 Sent: Fri, April 30, 2010 12:27:37 PM
 Subject: Re: Hbase: GETs are very slow

 Which version? How much heap was given to HBase?

 WRT block caching, I don't see how it could impact uploading in any
 way, you should enable it. What was the problem inserting 1B rows
 exactly? How were you running the upload?

 Are you making sure there's no swap on the machines? That kills java
 performance faster than you can say hbase ;)

 J-D

 On Fri, Apr 30, 2010 at 8:36 AM, Ruben Quintero rfq_...@yahoo.com wrote:
 Hi,

 I have a hadoop/hbase cluster running on 9 machines (only 8 GB RAM, 1 TB 
 drives), and have recently noticed that Gets from Hbase have slowed down 
 significantly. I'd say at this point I'm not getting more than 100/sec when 
 using the Hbase Java API. DFS-wise, there's plenty of space left (using less 
 than 10%), and all of the servers seem okay. The tables use LZO, and have 
 blockcache disabled (we were having problems inserting up to a billion rows 
 with it on, and read in the mailing list somewhere that it might help).

 The primary table has only 4 million rows at the moment. I created a new 
 test table with only 200,000, and it was running 100/sec as well.

 I'm not sure what the problem could be (paging?), or some configuration that 
 can be adjusted?

 Any ideas? I can show our configuration if that's helpful, I just wasn't 
 sure what info would be helpful and what would be extraneous.

 Thanks,

 - Ruben










Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
Not sure why you are going through thrift if you are already using
java (you want to test thrift's speed because java isn't your main dev
language?) but it will maybe add 1ms or 2, really not that bad. Here
at StumbleUpon we use thrift to get our php website to talk to HBase
and on average we stay under 10ms for random gets. Our machines are
2xi7, 24GB, 4x1TB sata.

My coworker (Stack) pinged the author of the contrib to see if he can
make a patch for your issue.

J-D

On Fri, Apr 30, 2010 at 4:51 PM, Chris Tarnas c...@email.com wrote:

 On Apr 30, 2010, at 4:44 PM, Jean-Daniel Cryans wrote:

 On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas c...@email.com wrote:


 I'm also using thrift to connect and am wondering if that itself puts an 
 overall limit on scaling? It does seem that no matter how many more mappers 
 and servers I add, even without indexing, I am capped at about 5k rows/sec 
 total. I'm waiting a bit as the table grows so that it is split across more 
 regionservers, hopefully that will help, but as far as I can tell I am not 
 hitting any CPU or IO constraint during my tests.

 I don't understand the I'm also using thrift and how many more
 mappers part, you are using Thrift inside a map? Anyways, more
 clients won't help since there's a single mega serialization of all
 the inserts to the index table per region server. It's normal not to
 see any CPU/mem/IO contention since, in this case, it's all about the
 speed at which you can process a single row insertion The rest of the
 threads just wait...


 Sorry - should have been more clear. I'm testing now with a normal tables and 
 regionservers and I seem to cap out at about 5-7k rows a second for inserts. 
 My method for doing inserts is to use map reduce on hadoop to launch many 
 insert processes, each process uses the local thrift server on each node to 
 connect to hbase. In this case I hope that other threads can insert at the 
 same time.

 -chris





Re: Hbase Hive

2010-04-30 Thread Jean-Daniel Cryans
Inline (and added hbase-user to the recipients).

J-D

On Thu, Apr 29, 2010 at 9:23 PM, Amit Kumar amkumar@gmail.com wrote:
 Hi Everyone,

 I want to ask about Hbase and Hive.

 Q1 Is there any dialect available which can be used with Hibernate to
 create persistence with Hbase. Has somebody written one. I came across HBql
 at
       www.hbql.com. Can this be used to create a dialect for Hbase?

HBQL queries HBase directly, but it's not SQL-compliant and doesn't
feature relational keywords (since HBase doesn't support them, JOINs
don't scale). I don't know if anybody tried integrating HBQL in
Hibernate... it's still a very young project.


 Q2  Once the data is in there in Hbase. In this link I found that it can be
 used with Hive ( https://issues.apache.org/jira/browse/HIVE-705 ). So the
 question is is it safe enough to use the below architecture for application
 Hibernate -- Dialect for Hbase -- Hbase -- query from Hbase using Hive to
 use MapReduce effectively.

Hive goes on top of HBase, so you can use its query language to mine
HBase tables. Be aware that a MapReduce job isn't meant for live
queries, so issuing them from Hibernate doesn't make much sense...
unless you meant something else and this which case please do give
more details.


 Thanks  Regards
 Amit Kumar



Re: data node stops on slave

2010-04-26 Thread Jean-Daniel Cryans
Looks like your nodes share the same storage (NFS share or SAN?), and
only one DN can serve it (else it would be unmanageable).

J-D

On Mon, Apr 26, 2010 at 3:03 AM, Muhammad Mudassar mudassa...@gmail.com wrote:
 I have posted the problem in common-user but no one replied so now sending
 here to get some help on the issue.

 -- Forwarded message --
 From: Muhammad Mudassar mudassa...@gmail.com
 Date: Fri, Apr 23, 2010 at 4:59 PM
 Subject: data node stops on slave
 To: common-u...@hadoop.apache.org


 Hi

 I am following tutorial running hadoop on ubuntu linux (multinode cluster)
 *
 http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
 *http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
 for  configuring 2 node cluster but i am facing problem data node on slave
 machine goes down after some time here I am sending log file of datanode on
 slave machine and log file of namenode at master machine kindly help me to
 solve the issue.

 *Log file of data node on slave machine*

 2010-04-23 17:37:17,690 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = hadoop-desktop/127.0.1.1
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.2
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
 /
 2010-04-23 17:37:19,115 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: master/10.3.31.221:54310. Already tried 0 time(s).
 2010-04-23 17:37:25,303 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Registered
 FSDatasetStatusMBean
 2010-04-23 17:37:25,305 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
 2010-04-23 17:37:25,307 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is
 1048576 bytes/s
 2010-04-23 17:37:30,777 INFO org.mortbay.log: Logging to
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
 org.mortbay.log.Slf4jLog
 2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer: Port
 returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
 Opening the listener on 50075
 2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer:
 listener.getLocalPort() returned 50075
 webServer.getConnectors()[0].getLocalPort() returned 50075
 2010-04-23 17:37:30,833 INFO org.apache.hadoop.http.HttpServer: Jetty bound
 to port 50075
 2010-04-23 17:37:30,833 INFO org.mortbay.log: jetty-6.1.14
 2010-04-23 17:37:31,242 INFO org.mortbay.log: Started
 selectchannelconnec...@0.0.0.0:50075
 2010-04-23 17:37:31,279 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=DataNode, sessionId=null
 2010-04-23 17:37:36,608 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=DataNode, port=50020
 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
 Responder: starting
 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
 listener on 50020: starting
 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 0 on 50020: starting
 2010-04-23 17:37:36,610 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 1 on 50020: starting
 2010-04-23 17:37:36,611 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 2 on 50020: starting
 2010-04-23 17:37:36,611 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration =
 DatanodeRegistration(hadoop-desktop:50010,
 storageID=DS-463609775-127.0.1.1-50010-1271833984369, infoPort=50075,
 ipcPort=50020)
 2010-04-23 17:37:36,639 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 10.3.31.220:50010, storageID=DS-463609775-127.0.1.1-50010-1271833984369,
 infoPort=50075, ipcPort=50020)In DataNode.run, data =
 FSDataset{dirpath='/home/hadoop/Desktop/dfs/datahadoop/dfs/data/current'}
 2010-04-23 17:37:36,639 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL
 of 360msec Initial delay: 0msec
 2010-04-23 17:37:36,653 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 17 blocks
 got processed in 6 msecs
 2010-04-23 17:37:36,665 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block
 scanner.
 2010-04-23 17:37:39,641 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeCommand action:
 DNA_REGISTER
 2010-04-23 17:37:42,645 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down:
 org.apache.hadoop.ipc.RemoteException:
 org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node
 10.3.31.220:50010 is attempting to report storage ID
 DS-463609775-127.0.1.1-50010-1271833984369. Node 10.3.31.221:50010 is
 expected to serve this storage.
    at
 

Re: Get operation in HBase Map-Reduce methods

2010-04-20 Thread Jean-Daniel Cryans
What are the numbers like? Is it 1k rows you need to process? 1M? 10B?
Your question is more about scaling (or the need to).

J-D

On Tue, Apr 20, 2010 at 8:39 AM, Andrey atimerb...@gmx.net wrote:
 Dear All,

 Assumed, I've got a list of rowIDs of a HBase table. I want to get each row by
 its rowID, do some operations with its values, and store the results somewhere
 subsequently. Is there a good way to do this in a Map-Reduce manner?

 As far as I understand, a mapper usually takes a Scan to form inputs. It is
 quite possible to create such a Scan, which contains a lot of RowFilters to be
 EQUAL to a particular rowId. Such a strategy will work for sure, however is
 inefficient, since each filter will be tried to match to each found row.

 So, is there a good Map-Reduce praxis for such kind of situations? (E.g. to 
 make
 a Get operation inside a map() method.) If yes, could you kindly point to a 
 good
 code example?

 Thank you in advance.




Re: Get operation in HBase Map-Reduce methods

2010-04-20 Thread Jean-Daniel Cryans
That can be done in a couple of seconds using the normal HBase client
in a multithreaded process, fed by a message queue if you feel like
it. What were you trying to achieve using MR?

J-D

On Tue, Apr 20, 2010 at 12:54 PM, Andrey atimerb...@gmx.net wrote:
 Yes, about 1k rows currently. In the future it may happen to be more: some 
 tens
 of thousands.

 Andrey







Re: About the Log entries?

2010-04-19 Thread Jean-Daniel Cryans
You are reading it wrong. The second line you pasted shows how many
edits where applied to region test17,,1271654370789 without telling
you from which log it was coming from. Your log has edits from all
regions, including META and ROOT if present on that RS.

But, do expect data loss on un-rolled logs since that version of HDFS
doesn't support fsync.

J-D

On Mon, Apr 19, 2010 at 8:54 AM, ChingShen chingshenc...@gmail.com wrote:
 Hi,

   I wrote a sequential put example(300,000 rows, the memstore will not
 reach 64MB) to check how does the HLog work.

 2010-04-19 13:51:25,340 INFO org.apache.hadoop.hbase.regionserver.HLog: *
 Roll* /hbase/.logs/52-0980216-01,48562,1271656125926/hlog.dat.1271656125952,
 entries=*29*, calcsize=63753517, filesize=32619925. New hlog
 /hbase/.logs/52-0980216-01,48562,1271656125926/hlog.dat.1271656285337

 After I enter the kill -9 master_pid command and restart hbase:

 2010-04-19 13:53:57,578 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added hdfs://localhost/hbase/test17/955259787/content/3876923764760772557,
 entries=*291065*, sequenceid=32230123, memsize=48.9m, filesize=15.0m to
 test17,,1271654370789

 But why can I only get *291065* rather than *29* rows? data loss?

 Thanks.

 Shen



Re: Performance Evaluation randomRead failures after 20% of execution

2010-04-19 Thread Jean-Daniel Cryans
Not sure where to start, there are so many things wrong with your cluster. ;)

Commodity hardware is usually more than 1 cpu, and HBase itself
requires 1GB of RAM. Looking at slave2 for example, your datanode,
region server and MR processes are all competing for 512MB of RAM and
1 CPU. In the log lines you pasted, the more important stuff is:

2010-04-17 19:11:20,864 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 15430ms, ten times longer than scheduled: 1000

That means the JVM was pausing (because of GC, or swapping, or most
probably both) and becomes unresponsive. If you really wish to run
processing on that cluster, I would use the master and slave1 as
datanode and region servers then slave2 as MapReduce only. Also slave1
should have the Namenode, HBase Master and Zookeeper since it has more
RAM. Then I would configure the heaps so that I wouldn't swap, and
configure only 1 map and 1 reduce (not the default of 2).

But still, I wouldn't expect much processing juice out of that.

J-D

On Sat, Apr 17, 2010 at 8:13 PM, jayavelu jaisenthilkumar
joysent...@gmail.com wrote:
 Hi guys,
               I successfully configured hadoop, mapreduce and hbase.
 Now want to run Performance Evaluation a bit.

 The configuration of our systems are

 Master Machine:

 Processor:
     Intel Centrino Mobile Technology Processor 1.66 GHz CPUs
 Memory:
    1 GB/Go DDR2 SDRAM
 Storage:
    80 GB/Go
 Network:
    Gigabit Ethernet

 Slave 1 Machine:

 Processor:
     Core 2 Duo Intel T5450 Processor 1.66 GHz CPUs
 Memory:
    2 GB/Go DDR2 SDRAM
 Storage:
    200 GB/Go
 Network:
    Gigabit Ethernet

 Slave 2 Machine:

 Processor:
     Intel(R) Pentium(R) M processor 1400MHZ
 Memory:
    512 MB RAM
 Storage:
    45 GB
 Network:
    Gigabit Ethernet

 The Performance Evaluation algorithms sequentialWrite and
 sequentialRead are successfully runned.

 We followed the same procedure for randomWrite and randomRead.

 randomWrite was successful but randomRead was failed .  See the output
 below for the randomRead. ( The cpu memory usage was 94% is it the
 reason??)

 had...@hadoopserver:~/hadoop-0.20.1/bin ./hadoop
 org.apache.hadoop.hbase.PerformanceEvaluation randomRead 3
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51
 GMT
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:host.name=Hadoopserver
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:java.version=1.6.0_15
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:java.vendor=Sun Microsystems Inc.
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:java.home=/usr/java/jdk1.6.0_15/jre
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:java.class.path=/home/hadoop/hadoop-0.20.1/bin/../conf:/usr/java/jdk1.6.0_15/lib/tools.jar:/home/hadoop/hadoop-0.20.1/bin/..:/home/hadoop/hadoop-0.20.1/bin/../hadoop-0.20.1-core.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-codec-1.3.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-logging-1.0.4.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/commons-net-1.4.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jetty-6.1.14.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jetty-util-6.1.14.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/junit-3.8.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop-0.20.1/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/home/hadoop/hbase-0.20.3/hbase-0.20.3.jar:/home/hadoop/hbase-0.20.3/conf:/home/hadoop/hbase-0.20.3/hbase-0.20.3-test.jar:/home/hadoop/hbase-0.20.3/lib/zookeeper-3.2.2.jar
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:java.library.path=/home/hadoop/hadoop-0.20.1/bin/../lib/native/Linux-i386-32
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:java.io.tmpdir=/tmp
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client
 environment:java.compiler=NA
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
 10/04/17 17:58:08 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
 

Re: hitting xceiverCount limit (2047)

2010-04-13 Thread Jean-Daniel Cryans
Sujee,

How many regions do you have and how many families per region? Looks
like your datanodes have to keep a lot of xcievers opened.

J-D

On Tue, Apr 13, 2010 at 9:03 PM, Sujee Maniyam su...@sujee.net wrote:
 Thanks Stack.
 Do I also need to tweak timeouts?  right now they are at default
 values for both hadoop / hbase

 http://sujee.net



 On Tue, Apr 13, 2010 at 11:40 AM, Stack st...@duboce.net wrote:
 Looks like you'll have to up your xceivers or up the count of hdfs nodes.
 St.Ack

 On Tue, Apr 13, 2010 at 11:37 AM, Sujee Maniyam su...@sujee.net wrote:
 Hi all,

 I have been importing a bunch of data into my hbase cluster, and I see
 the following error:

 Hbase error :
 hdfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink A.B.C.D

 Hadoop data node error:
     DataXceiver : java.io.IOException: xceiverCount 2048 exceeds the
 limit of concurrent xcievers 2047


 I have configured dfs.datanode.max.xcievers = 2047 in
 hadoop/conf/hdfs-site.xml

 Config:
 amazon ec2   c1.xlarge instances (8 CPU, 8G RAM)
 1 master + 4 region servers
 hbase heap size = 3G


 Upping the xcievers count, is an option.  I want to make sure if I
 need to tweak any other parameters to match this.

 thanks
 Sujee
 http://sujee.net





Re: hitting xceiverCount limit (2047)

2010-04-13 Thread Jean-Daniel Cryans
Exactly what you think, since all the xcievers are full then HBase
cannot write to HDFS so the files cannot be persisted. This usually
ends up shutting the RS since we don't want to mess things up even
more. Then the master does a log replay to recover edits that were in
the memstore.

7k regions is too much for that cluster. Every region has at least one
file for the .regioninfo plus a bunch of other for the store files of
the column family (at least 1 file). There's one xceiver per block
being served (a block is 64MB) so with only 8k xceivers you simply
cannot support that many regions. Is your table LZOed? If not, do
consider it!

J-D

On Tue, Apr 13, 2010 at 10:42 PM, Sujee Maniyam su...@sujee.net wrote:
 J-D,
 - about 7000 regions (spread over 4 region servers).
 - one column family.
 - each row is about 1kbytes
 - 400M rows

 when the xciever limit is hit, I see the following errors on master log

 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.IOException: Bad connect ack with
 firstBadLink 10.210.X.Y:50010
 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
 blk_3157562535002015020_4324755

 what exactly does 'abandoning block' mean?

 thanks
 Sujee

 http://sujee.net



 On Tue, Apr 13, 2010 at 12:23 PM, Jean-Daniel Cryans
 jdcry...@apache.org wrote:
 Sujee,

 How many regions do you have and how many families per region? Looks
 like your datanodes have to keep a lot of xcievers opened.

 J-D

 On Tue, Apr 13, 2010 at 9:03 PM, Sujee Maniyam su...@sujee.net wrote:
 Thanks Stack.
 Do I also need to tweak timeouts?  right now they are at default
 values for both hadoop / hbase

 http://sujee.net



 On Tue, Apr 13, 2010 at 11:40 AM, Stack st...@duboce.net wrote:
 Looks like you'll have to up your xceivers or up the count of hdfs nodes.
 St.Ack

 On Tue, Apr 13, 2010 at 11:37 AM, Sujee Maniyam su...@sujee.net wrote:
 Hi all,

 I have been importing a bunch of data into my hbase cluster, and I see
 the following error:

 Hbase error :
 hdfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Bad connect ack with firstBadLink A.B.C.D

 Hadoop data node error:
     DataXceiver : java.io.IOException: xceiverCount 2048 exceeds the
 limit of concurrent xcievers 2047


 I have configured dfs.datanode.max.xcievers = 2047 in
 hadoop/conf/hdfs-site.xml

 Config:
 amazon ec2   c1.xlarge instances (8 CPU, 8G RAM)
 1 master + 4 region servers
 hbase heap size = 3G


 Upping the xcievers count, is an option.  I want to make sure if I
 need to tweak any other parameters to match this.

 thanks
 Sujee
 http://sujee.net







Re: Why does throw java.io.IOException when I run a job?

2010-04-12 Thread Jean-Daniel Cryans
Did you restart Hadoop after changing the configs? If you get the
error it means that it wasn't picked up so there's not that many
things to check (checks that only you can do).

J-D

On Mon, Apr 12, 2010 at 4:28 AM, 无名氏 sitong1...@gmail.com wrote:
 hi

 I received an IOException when I run a job ...
 java.io.IOException: xceiverCount 257 exceeds the limit of concurrent
 xcievers 256
    at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
    at java.lang.Thread.run(Thread.java:619)

 But I have configured dfs.datanode.max.xcievers to 4096.

 *core-site.xml*
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 !-- Put site-specific property overrides in this file. --
 configuration
 property
  namehadoop.tmp.dir/name
  value/home/${user.name}/tmp/hadoop/value
  descriptionA base for other temporary directories./description
 /property
 property
 namefs.default.name/name
 valuehdfs://search9b.cm3:9000/value
 /property
 property
 *namedfs.datanode.max.xcievers/name
 values4096/values*
 /property
 property
 namefs.inmemory.size.mb/name
 values200/values
 /property
 property
 nameio.sort.factor/name
 values100/values
 /property
 property
 nameio.sort.mb/name
 values200/values
 /property
 property
 nameio.file.buffer.size/name
 values131072/values
 /property
 property
 namemapred.job.tracker.handler.count/name
 values60/values
 /property
 property
 namemapred.reduce.parallel.copies/name
 values50/values
 /property
 property
 nametasktracker.http.threads/name
 values50/values
 /property
 property
 namemapred.child.java.opts/name
 values-Xmx1024M/values
 /property
 /configuration

 *hdfs-site.xml*
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 !-- Put site-specific property overrides in this file. --
 configuration
 property
 namedfs.data.dir/name
 valuedfs/data/value
 /property
 property
 namedfs.name.dir/name
 valuedfs/name/value
 /property
 property
 *namedfs.datanode.max.xcievers/name
 values4096/values*
 /property
 property
 namedfs.namenode.handler.count/name
 values40/values
 /property
 property
 namedfs.datanode.handler.coun/name
 values9/values
 /property
 /configuration

 thks.



Re: set number of map tasks for HBase MR

2010-04-11 Thread Jean-Daniel Cryans
A map against a HBase table by default cannot have more tasks than the
number of regions in that table.

Also you want to enable scanner caching. Pass a Scan object to the
TableMapReduceUtil.initTableMapperJob that is configured with
scan.setCaching(some_value) where the value should be the number of
rows to fetch every time we hit a region server with next(). On rows
of 100-200 bytes, our jobs usually are configured with 1000 up to
1.

Finally, is your job running in local mode or on a job tracker? Even
if HBase uses HDFS, it usually doesn't know of the job tracker unless
you configure HBase's classpath with Hadoop's conf.

J-D

On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko
cryp...@mail.saturnfans.com wrote:
 Hi,

 thanks for quick response. I tried to do following in the code:

 job.getConfiguration().setInt(mapred.map.tasks, 1);

 but unfortunately have the same result.

 Any other ideas?

 --- ama...@gmail.com wrote:

 From: Amandeep Khurana ama...@gmail.com
 To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com
 Subject: Re: set number of map tasks for HBase MR
 Date: Sat, 10 Apr 2010 20:04:18 -0700

 You can set the number of map tasks in your job config to a big number (eg:
 10), and the library will automatically spawn one map task per region.

 -ak


 Amandeep Khurana
 Computer Science Graduate Student
 University of California, Santa Cruz


 On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko 
 cryp...@mail.saturnfans.com wrote:

 Hi guys,

 I have about 8G Hbase table  and I want to run MR job against it. It works
 extremely slow in my case. One thing I noticed is that job runs only 2 map
 tasks. Is it any way to setup bigger number of map tasks? I sow some method
 in mapred package, but can't find anything like this in new mapreduce
 package.

 I run my MR job one a single machine in cluster mode.


 _
 Sign up for your free SaturnFans email account at
 http://webmail.saturnfans.com/





 _
 Sign up for your free SaturnFans email account at 
 http://webmail.saturnfans.com/



Re: Region not getting served

2010-04-11 Thread Jean-Daniel Cryans
Exactly which version of hbase are you using? According to my digging
of HStoreKey's SVN history to match the row numbers, you seem to be on
the 0.19 branch. Any reason you are using something that's a year old
compared to 0.20.3 which was released last January?

BTW your splitting isn't wrong, the region server is trying to parse
the column family and there's something null where it shouldn't be.

J-D

On Sun, Apr 11, 2010 at 10:55 AM, john smith js1987.sm...@gmail.com wrote:
 Hi all,

 I wrote my own getSplits() function for HBase-MR . A is a table involved
 in MR . I am getting the following stack trace. It seems that it couldn't
 access the region. But my region server is up and running. Does it indicate
 that my splitting is wrong?

 http://pastebin.com/YBK4JQBu

 Thanks
 j.S



Re: set number of map tasks for HBase MR

2010-04-11 Thread Jean-Daniel Cryans
Yes an option could be added, along with a write buffer option for Import.

J-D

On Sun, Apr 11, 2010 at 3:30 PM, Ted Yu yuzhih...@gmail.com wrote:
 I noticed mapreduce.Export.createSubmittableJob() doesn't call setCaching()
 in 0.20.3

 Should call to setCaching() be added ?

 Thanks

 On Sun, Apr 11, 2010 at 2:14 AM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 A map against a HBase table by default cannot have more tasks than the
 number of regions in that table.

 Also you want to enable scanner caching. Pass a Scan object to the
 TableMapReduceUtil.initTableMapperJob that is configured with
 scan.setCaching(some_value) where the value should be the number of
 rows to fetch every time we hit a region server with next(). On rows
 of 100-200 bytes, our jobs usually are configured with 1000 up to
 1.

 Finally, is your job running in local mode or on a job tracker? Even
 if HBase uses HDFS, it usually doesn't know of the job tracker unless
 you configure HBase's classpath with Hadoop's conf.

 J-D

 On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko
 cryp...@mail.saturnfans.com wrote:
  Hi,
 
  thanks for quick response. I tried to do following in the code:
 
  job.getConfiguration().setInt(mapred.map.tasks, 1);
 
  but unfortunately have the same result.
 
  Any other ideas?
 
  --- ama...@gmail.com wrote:
 
  From: Amandeep Khurana ama...@gmail.com
  To: hbase-user@hadoop.apache.org, cryp...@mail.saturnfans.com
  Subject: Re: set number of map tasks for HBase MR
  Date: Sat, 10 Apr 2010 20:04:18 -0700
 
  You can set the number of map tasks in your job config to a big number
 (eg:
  10), and the library will automatically spawn one map task per
 region.
 
  -ak
 
 
  Amandeep Khurana
  Computer Science Graduate Student
  University of California, Santa Cruz
 
 
  On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko 
  cryp...@mail.saturnfans.com wrote:
 
  Hi guys,
 
  I have about 8G Hbase table  and I want to run MR job against it. It
 works
  extremely slow in my case. One thing I noticed is that job runs only 2
 map
  tasks. Is it any way to setup bigger number of map tasks? I sow some
 method
  in mapred package, but can't find anything like this in new mapreduce
  package.
 
  I run my MR job one a single machine in cluster mode.
 
 
  _
  Sign up for your free SaturnFans email account at
  http://webmail.saturnfans.com/
 
 
 
 
 
  _
  Sign up for your free SaturnFans email account at
 http://webmail.saturnfans.com/
 




Re: Region not getting served

2010-04-11 Thread Jean-Daniel Cryans
WRT the original problem, I only see the result and not the code or
anything else. Help me help you. (but it's probably better in 0.20,
hence why I suggest upgrading)

Text implements WritableComparable, so it's not your problem.
TextArrayWritable is not in the 0.20 branch IIRC, that should be the
problem.

J-D

On Sun, Apr 11, 2010 at 7:32 PM, john smith js1987.sm...@gmail.com wrote:
 J.D,


 I tried working with the 0.20+ branch of hadoop and Hbase. I changed my
 build paths in eclipse and I found out the following errors

 public class MyTableMap extends MapReduceBase
 implements TableMapText, TextArrayWritable {


 It is saying that  the position of Text must extend WritableComparable which
 is true for hadoop 0.19 branch where as it is showing errors for 0.20+
 branch because class Text extends BinaryComparable class. Any solution to
 this or to the solution to the original problem .. (as you said some problem
 with the parsing)..

 Kindly help me

 Thanks


 On Sun, Apr 11, 2010 at 6:07 PM, john smith js1987.sm...@gmail.com wrote:


 J.D.

 Thanks for replying. My hbase version is 0.19.3. Because I wrote many codes
 for this version, I haven't updated it.
 Also i'll check if therz any problem with my column family naming ..such as
 missing : etc and I'll let you know.

 Thanks


 On Sun, Apr 11, 2010 at 5:10 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 Exactly which version of hbase are you using? According to my digging
 of HStoreKey's SVN history to match the row numbers, you seem to be on
 the 0.19 branch. Any reason you are using something that's a year old
 compared to 0.20.3 which was released last January?

 BTW your splitting isn't wrong, the region server is trying to parse
 the column family and there's something null where it shouldn't be.

 J-D

 On Sun, Apr 11, 2010 at 10:55 AM, john smith js1987.sm...@gmail.com
 wrote:
  Hi all,
 
  I wrote my own getSplits() function for HBase-MR . A is a table
 involved
  in MR . I am getting the following stack trace. It seems that it
 couldn't
  access the region. But my region server is up and running. Does it
 indicate
  that my splitting is wrong?
 
  http://pastebin.com/YBK4JQBu
 
  Thanks
  j.S
 






Re: HTable Client RS caching

2010-04-08 Thread Jean-Daniel Cryans
On Wed, Apr 7, 2010 at 11:38 PM, Al Lias al.l...@gmx.de wrote:
 Occationally my HTable clients get a response that no server is serving
 a particular region...
 Normally, the region is back a few seconds later (perhaps a split?).

Or the region moved.


 Anyway, the client (Using HTablePool) seems to need a restart to forget
 this.

Seems wrong, would love a stack trace.


 Is there a config value to manipulate the caching time of regionserver
 assignments in the client?

Nope, when the client sees a NSRE, it queries .META. to find the new location.


 I set a small value for hbase.client.pause to get failures fast. I am
 using 0.20.3 .

Splits are still kinda slow, takes at least 2 seconds to happen, but
finding the new location of a region is a core feature in HBase and
it's rather well tested, Can you pin down your exact problem? Next
time a NSRE happens, see which region it was looking for and grep the
master log for it, you should see the history and how much time it
took to move.


 Thx,

  Al



Re: Received RetriesExhaustedException when write to hbase table, or received WrongRegionException when read from hbase table.

2010-04-08 Thread Jean-Daniel Cryans
Without knowing what happened, it's hard to propose a cure...

Anyways, restarting the cluster normally take care of such situation
or you can recreate all the .META. entries by running bin/add_table.rb

J-D

2010/4/7 无名氏 sitong1...@gmail.com:
 I am anxious how to repair the region, or recreate the region for continue
 write.
 No need to recover data.

 thks


 2010/4/8 Jean-Daniel Cryans jdcry...@apache.org

 I would also like to know why your region server went bad, but I'm
 missing a lot of information here ;) Like the version of hadoop/hbase,
 size of your cluster, the hardware, what/how much are you trying to
 insert, and definitely some master and region server logs either in a
 pastebin or on a web server, not directly into the email.

 Thx,

 J-D

 On Wed, Apr 7, 2010 at 1:33 AM, 无名氏 sitong1...@gmail.com wrote:
  Some region server bad, I doubt.
 
  When I write record to HBase table, throw RetriesExhaustedException:
 
  Exception in thread main
  org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
  contact region server Some server, retryOnlyOne=true, index=0,
  islastrow=true, tries=9, numtries=10, i=0, listsize=1,
 
 region=web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993
  for region
 web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993,
  row
 'r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19',
  but failed after 10 attempts.
  Exceptions:
 at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1120)
 at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201)
 at
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605)
 at storage.client.FeedSchema.flushCommits(FeedSchema.java:72)
 
  When I read info from HBase table.
  org.apache.hadoop.hbase.regionserver.WrongRegionException:
  org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row
 out
  of range for HRegion
 
 web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993,
 
 startKey='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1',
 
 getEndKey()='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D643994',
 
 row='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19'
 at
  org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1522)
 at
 
 org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1554)
 at
  org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1622)
 at
  org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2285)
 at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1788)
 at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
 at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
  org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
 at
  org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 
  I get META info through hbase shell.
  command:
   get '.META.',
 
 web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993
  result :
  COLUMN
  CELL
 
   info:regioninfo timestamp=1270529567780, value=REGION =
 {NAME
  = 'web_info,r:http:\\x2F\\x2Fcom.
 
 
 ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D
  \\x26page\\x3D1,1270529565993', STARTKEY =
  'r:http:\\x2F\\x2Fcom.ccidnet.linux.b
 
 
 bs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D\\x26page\\x3D1
  ', ENDKEY =
  'r:http:\\x2F\\x2Fcom.ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D643
  994', ENCODED = 1771513916, TABLE = {{NAME
 =
  'web_info', FAMILIES = [{NAME =
   'article_dedup', VERSIONS = '2',
 COMPRESSION
  = 'NONE', TTL = '2147483647', BL
  OCKSIZE = '65536', IN_MEMORY = 'false',
  BLOCKCACHE = 'true'}, {NAME = 'dedup'
  , VERSIONS = '2', COMPRESSION = 'NONE', TTL
  = '2147483647', BLOCKSIZE = '6553
  6', IN_MEMORY = 'false', BLOCKCACHE =
  'true'}, {NAME = 'global', VERSIONS = '
  2', COMPRESSION = 'NONE', TTL =
 '2147483647',
  BLOCKSIZE = '65536', IN_MEMORY =
   'true', BLOCKCACHE = 'true'}, {NAME =
  'page_type', VERSIONS = '2

Re: HTable Client RS caching

2010-04-08 Thread Jean-Daniel Cryans
No it's there: domaincrawltable,,1270600690648

J-D

On Thu, Apr 8, 2010 at 10:38 AM, Ted Yu yuzhih...@gmail.com wrote:
 What if there is no region information in NSRE ?

 2010-04-08 10:26:38,385 ERROR [IPC Server handler 60 on 60020]
 regionserver.HRegionServer(846): Failed openScanner
 org.apache.hadoop.hbase.NotServingRegionException:
 domaincrawltable,,1270600690648
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1893)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)


 On Thu, Apr 8, 2010 at 9:39 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 On Wed, Apr 7, 2010 at 11:38 PM, Al Lias al.l...@gmx.de wrote:
  Occationally my HTable clients get a response that no server is serving
  a particular region...
  Normally, the region is back a few seconds later (perhaps a split?).

 Or the region moved.

 
  Anyway, the client (Using HTablePool) seems to need a restart to forget
  this.

 Seems wrong, would love a stack trace.

 
  Is there a config value to manipulate the caching time of regionserver
  assignments in the client?

 Nope, when the client sees a NSRE, it queries .META. to find the new
 location.

 
  I set a small value for hbase.client.pause to get failures fast. I am
  using 0.20.3 .

 Splits are still kinda slow, takes at least 2 seconds to happen, but
 finding the new location of a region is a core feature in HBase and
 it's rather well tested, Can you pin down your exact problem? Next
 time a NSRE happens, see which region it was looking for and grep the
 master log for it, you should see the history and how much time it
 took to move.

 
  Thx,
 
   Al
 




Re: Received RetriesExhaustedException when write to hbase table, or received WrongRegionException when read from hbase table.

2010-04-07 Thread Jean-Daniel Cryans
I would also like to know why your region server went bad, but I'm
missing a lot of information here ;) Like the version of hadoop/hbase,
size of your cluster, the hardware, what/how much are you trying to
insert, and definitely some master and region server logs either in a
pastebin or on a web server, not directly into the email.

Thx,

J-D

On Wed, Apr 7, 2010 at 1:33 AM, 无名氏 sitong1...@gmail.com wrote:
 Some region server bad, I doubt.

 When I write record to HBase table, throw RetriesExhaustedException:

 Exception in thread main
 org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
 contact region server Some server, retryOnlyOne=true, index=0,
 islastrow=true, tries=9, numtries=10, i=0, listsize=1,
 region=web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993
 for region 
 web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993,
 row 
 'r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19',
 but failed after 10 attempts.
 Exceptions:
        at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1120)
        at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201)
        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605)
        at storage.client.FeedSchema.flushCommits(FeedSchema.java:72)

 When I read info from HBase table.
 org.apache.hadoop.hbase.regionserver.WrongRegionException:
 org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out
 of range for HRegion
 web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993,
 startKey='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1',
 getEndKey()='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D643994',
 row='r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D665064\x26page\x3De\x26fpage\x3D19'
        at
 org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1522)
        at
 org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1554)
        at
 org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1622)
        at
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2285)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1788)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 I get META info through hbase shell.
 command:
  get '.META.',
 web_info,r:http:\x2F\x2Fcom.ccidnet.linux.bbs\x2Fread.php\x3Ftid\x3D593055\x26fpage\x3D0\x26toread\x3D\x26page\x3D1,1270529565993
 result :
 COLUMN
 CELL

  info:regioninfo             timestamp=1270529567780, value=REGION = {NAME
 = 'web_info,r:http:\\x2F\\x2Fcom.

 ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D
                             \\x26page\\x3D1,1270529565993', STARTKEY =
 'r:http:\\x2F\\x2Fcom.ccidnet.linux.b

 bs\\x2Fread.php\\x3Ftid\\x3D593055\\x26fpage\\x3D0\\x26toread\\x3D\\x26page\\x3D1
                             ', ENDKEY =
 'r:http:\\x2F\\x2Fcom.ccidnet.linux.bbs\\x2Fread.php\\x3Ftid\\x3D643
                             994', ENCODED = 1771513916, TABLE = {{NAME =
 'web_info', FAMILIES = [{NAME =
                              'article_dedup', VERSIONS = '2', COMPRESSION
 = 'NONE', TTL = '2147483647', BL
                             OCKSIZE = '65536', IN_MEMORY = 'false',
 BLOCKCACHE = 'true'}, {NAME = 'dedup'
                             , VERSIONS = '2', COMPRESSION = 'NONE', TTL
 = '2147483647', BLOCKSIZE = '6553
                             6', IN_MEMORY = 'false', BLOCKCACHE =
 'true'}, {NAME = 'global', VERSIONS = '
                             2', COMPRESSION = 'NONE', TTL = '2147483647',
 BLOCKSIZE = '65536', IN_MEMORY =
                              'true', BLOCKCACHE = 'true'}, {NAME =
 'page_type', VERSIONS = '2', COMPRESSI
                             ON = 'NONE', TTL = '2147483647', BLOCKSIZE =
 '65536', IN_MEMORY = 'false', BL
                             OCKCACHE = 'true'}, {NAME = 'parser',
 VERSIONS = '2', COMPRESSION = 'GZ', TTL
                              = '2147483647', BLOCKSIZE = '65536',
 IN_MEMORY = 'false', BLOCKCACHE = 'true
                             '}, {NAME = 'pid_match', VERSIONS = '2',
 COMPRESSION = 'NONE', TTL = '2147483
                             647', BLOCKSIZE = '65536', IN_MEMORY =
 

Re: HBase always corrupted

2010-04-07 Thread Jean-Daniel Cryans
At StumbleUpon we have north of 20 billions rows, each of 100-200 bytes.

Look in your datanode log for

http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5

or that

http://wiki.apache.org/hadoop/Hbase/FAQ#A6

J-D

On Wed, Apr 7, 2010 at 9:55 AM, Geoff Hendrey ghend...@decarta.com wrote:
 Hi,

 I am running an HBase instance in a pseudocluster mode, on top of a
 pseudoclustered HDFS, on a single machine. I have a 10 node map/reduce
 cluster that is using a TableMapper to drive a map/reduce job. In the
 map phase, two Gets are executed against against HBase. The Map phase
 generates two orders of magnitude more data than was pumped in, and in
 the reduce phase we do some consolidation of the generated data, then
 execute a Put into HBase with autocomit=false, and the batch size set to
 100,000 (I tried 1000,1 as well and found 100,000 worked best). I am
 using 32 reducers, and reduce seems to run 1000X slower than mapping.

 Unfortunately, the job consistently crashes around 85% reduce
 completion, with HDFS related errors from the HBase machine:

 java.io.IOException: java.io.IOException: All datanodes 127.0.0.1:50010
 are bad. Aborting...
        at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DF
 SClient.java:2525)
        at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.j
 ava:2078)
        at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli
 ent.java:2241)
 So I am clearly aware of the mismatch betweem the  big mapreduce
 cluster, and the wimpy HBase installation, but why am I seeing
 consistent crashes? Shouldn't the HBase cluster just be slower, not
 unreliable?
 Here is my main question: should I expect that running a real HBase
 cluster will solve my problems and does anyone have experience with a
 map/reduce job that pumps several billion rows into HBase?
 -geoff



Re: HBase Client Maven Dependency in POM

2010-04-07 Thread Jean-Daniel Cryans
Same answer I gave an hour ago to your other email:

Sajeev,

0.20 isn't mavenized, the svn trunk is.

J-D

On Wed, Apr 7, 2010 at 10:46 AM, Sajeev Joseph
sajeev.jos...@cypresscare.com wrote:
 I have HBase 0.20.3 up and running on my 'windows/cygwin' platform. Now, I 
 would like to write a Java client to access the HBase server. This java 
 client would be part of a large Enterprise Service Bus (ESB) application we 
 currently have. We use maven as build tool with all our applications, and I 
 would like to add an HBase client dependency in our 'POM' to pull in all the 
 relevant jar files required by the HBase client API.  After spending hours 
 reading through HBase  documentation, I couldn't find this mentioned 
 anywhere.  Am I missing something? Do you have a maven repo where I can pull 
 in all jars required by the HBase Client?

 Thank you,
 Sajeev Joseph




Re: DFS too busy/down? while writing back to HDFS.

2010-04-06 Thread Jean-Daniel Cryans
From DataXceiver's javadoc

/**
 * Thread for processing incoming/outgoing data stream.
 */

So it's a bit different from the handlers AFAIK.

J-D

On Mon, Apr 5, 2010 at 10:57 PM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
 than, J.D.
          my cluster has the first problem. BTW, dfs.datanode.max.xcievers
 means the number of concurrent connections for a datanode right?

 On Tue, Apr 6, 2010 at 12:35 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 Look at your datanode logs around the same time. You probably either have
 this

 http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5

 or that

 http://wiki.apache.org/hadoop/Hbase/FAQ#A6

 Also you see to be putting a fair number of regions on those region
 servers judging by the metrics, do consider setting HBASE_HEAP higher
 than 1GB in conf/hbase-env.sh

 J-D

 On Mon, Apr 5, 2010 at 8:38 PM, steven zhuang
 steven.zhuang.1...@gmail.com wrote:
  greetings,
 
         while I was importing data into my HBase Cluster, I found one
  regionserver is down, and by check the log, I found following exceptoin:
  *EOFException*(during HBase flush memstore to HDFS file? not sure)
 
         seems that it's caused by DFSClient not working, I don't know the
  exact reason, maybe it's caused by the heavy load on the machine where
 the
  datanode is residing on, or the disk is full. but I am not sure which DFS
  node should I check.
         has anybody met the same problem? any pointer or hint is
  appreciated.
 
        The log is as follows:
 
 
  2010-04-06 03:04:34,065 INFO
 org.apache.hadoop.hbase.regionserver.HRegion:
  Blocking updates for 'IPC Server handler 20 on 60020' on region
  hbt2table16,,1270522012397: memstore size 128.0m is = than blocking
 128.0m
  size
  2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Completed compaction of 34; new storefile is
  hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047;
 store
  size is 2.9m
  2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0
  2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Started compaction of 5 file(s)  into
  hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
  2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Added
  hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438,
  entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to
  hbt2table16,,1270522012397
  2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Completed compaction of 35; new storefile is
  hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032;
 store
  size is 2.9m
  2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0
  2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Started compaction of 4 file(s)  into
  hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
  2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Added
  hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130,
  entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to
  hbt2table16,,1270522012397
  2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Added
  hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005,
  entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to
  hbt2table16,,1270522012397
  2010-04-06 03:04:35,866 DEBUG
  org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
  started.  Attempting to free 20853136 bytes
  2010-04-06 03:04:37,010 DEBUG
  org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
  completed. Freed 20866928 bytes.  Priority Sizes: Single=17.422821MB
  (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0)
  2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception
 in
  createBlockOutputStream java.io.EOFException
  2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
  block blk_-6935524980745310745_1391901
  2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Completed compaction of 36; new storefile is
  hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916;
 store
  size is 2.9m
  2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0
  2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
  Started compaction of 4 file(s)  into
  hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
  2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception
 in
  createBlockOutputStream java.io.*EOFException*
  2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
  block

Re: hbase mapreduce scan

2010-04-06 Thread Jean-Daniel Cryans
Or put it in MySQL, or in S3, or...or... so my point was that you need
a recipient that transcends the JVMs ;)

So it is doable and pretty normal to output in tables the result of
MRs that map other tables, we have dozens of those here at
StumbleUpon. But if it fits in a single HashMap in a single JVM, my
guess is that the output is very small hence this is an operation done
for live clients and not suitable for MR.

J-D

On Tue, Apr 6, 2010 at 4:34 AM, Michael Segel michael_se...@hotmail.com wrote:


 J-D,

 There's an alternative...

 He could write a M/R that takes the input from a scan() , do something, 
 reduce() and then output the reduced set back to hbase in the form of a temp 
 table.
 (Even an in memory temp table) and then at the end pull the data out in to a 
 hash table?

 In theory this should be possible, but I haven't had time to play with in 
 memory tables

 No?


 Thx

 -Mike

 Date: Mon, 5 Apr 2010 09:57:02 -0700
 Subject: Re: hbase mapreduce scan
 From: jdcry...@apache.org
 To: hbase-user@hadoop.apache.org

 You want to put the result in a HashMap? MapReduce is a batch
 processing framework that runs multiple parallel JVMs over a cluster
 of machines so I don't see how you could simply output in a HashMap
 (unless you don't mind outputting on disk, then reading it back into a
 HashMap).

 So I will guess that you want to do a live query against HBase, here
 MR won't help you since that is meant for bulk processing which
 usually takes more than a minute.

 What you want to use is a Scan, using HTable. The unit tests have tons
 of example on how to use a scanner, look in the
 org.apache.hadoop.hbase.client package, so will find what you need.
 The main client package also contains some examples
 http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/package-summary.html

 J-D

 On Sun, Apr 4, 2010 at 11:18 AM, Jürgen Jakobitsch jakobits...@punkt.at 
 wrote:
  hi,
 
  i'm totally new to hbase and mapreduce and could really need some
  pointer into the right direction for the following situation.
 
  i managed to run a basic mapreduce example - analog to Export.java
  in the hbase.mapreduce package.
 
  what i need to achieve is the following :
 
  do a map/reduce scan on a hbase table and put the results
  into a HashMap.
 
  could someone point me to an example.
 
  any help really appreciated
 
  wkr turnguard.com/turnguard
 
  --
  punkt. netServices
  __
  Jürgen Jakobitsch
  Codeography
 
  Lerchenfelder Gürtel 43 Top 5/2
  A - 1160 Wien
  Tel.: 01 / 897 41 22 - 29
  Fax: 01 / 897 41 22 - 22
 
  netServices http://www.punkt.at
 
 

 _
 Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
 http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1


Re: enabling hbase metrics on a running instance

2010-04-06 Thread Jean-Daniel Cryans
This boils down to the question: can you enable JMX while the JVM is
running? The answer is no (afaik).

More doc here 
http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

J-D

On Tue, Apr 6, 2010 at 4:12 PM, Igor Ranitovic irani...@gmail.com wrote:
 Is it possible to enable the hbase metrics without a restart? Thanks.

 i.



Re: hbase mapreduce scan

2010-04-05 Thread Jean-Daniel Cryans
You want to put the result in a HashMap? MapReduce is a batch
processing framework that runs multiple parallel JVMs over a cluster
of machines so I don't see how you could simply output in a HashMap
(unless you don't mind outputting on disk, then reading it back into a
HashMap).

So I will guess that you want to do a live query against HBase, here
MR won't help you since that is meant for bulk processing which
usually takes more than a minute.

What you want to use is a Scan, using HTable. The unit tests have tons
of example on how to use a scanner, look in the
org.apache.hadoop.hbase.client package, so will find what you need.
The main client package also contains some examples
http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/package-summary.html

J-D

On Sun, Apr 4, 2010 at 11:18 AM, Jürgen Jakobitsch jakobits...@punkt.at wrote:
 hi,

 i'm totally new to hbase and mapreduce and could really need some
 pointer into the right direction for the following situation.

 i managed to run a basic mapreduce example - analog to Export.java
 in the hbase.mapreduce package.

 what i need to achieve is the following :

 do a map/reduce scan on a hbase table and put the results
 into a HashMap.

 could someone point me to an example.

 any help really appreciated

 wkr turnguard.com/turnguard

 --
 punkt. netServices
 __
 Jürgen Jakobitsch
 Codeography

 Lerchenfelder Gürtel 43 Top 5/2
 A - 1160 Wien
 Tel.: 01 / 897 41 22 - 29
 Fax: 01 / 897 41 22 - 22

 netServices http://www.punkt.at




Re: DFS too busy/down? while writing back to HDFS.

2010-04-05 Thread Jean-Daniel Cryans
Look at your datanode logs around the same time. You probably either have this

http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5

or that

http://wiki.apache.org/hadoop/Hbase/FAQ#A6

Also you see to be putting a fair number of regions on those region
servers judging by the metrics, do consider setting HBASE_HEAP higher
than 1GB in conf/hbase-env.sh

J-D

On Mon, Apr 5, 2010 at 8:38 PM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
 greetings,

        while I was importing data into my HBase Cluster, I found one
 regionserver is down, and by check the log, I found following exceptoin:
 *EOFException*(during HBase flush memstore to HDFS file? not sure)

        seems that it's caused by DFSClient not working, I don't know the
 exact reason, maybe it's caused by the heavy load on the machine where the
 datanode is residing on, or the disk is full. but I am not sure which DFS
 node should I check.
        has anybody met the same problem? any pointer or hint is
 appreciated.

       The log is as follows:


 2010-04-06 03:04:34,065 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 Blocking updates for 'IPC Server handler 20 on 60020' on region
 hbt2table16,,1270522012397: memstore size 128.0m is = than blocking 128.0m
 size
 2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 34; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047; store
 size is 2.9m
 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 5 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438,
 entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 35; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032; store
 size is 2.9m
 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 4 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130,
 entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005,
 entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,866 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
 started.  Attempting to free 20853136 bytes
 2010-04-06 03:04:37,010 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
 completed. Freed 20866928 bytes.  Priority Sizes: Single=17.422821MB
 (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0)
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-6935524980745310745_1391901
 2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 36; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916; store
 size is 2.9m
 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 4 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.*EOFException*
 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_2467598422201289982_1391902
 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-2065206049437531800_1391902
 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-3059563223628992257_1391902
 2010-04-06 03:05:01,588 WARN org.apache.hadoop.hdfs.DFSClient: 

Re: DFS too busy/down? while writing back to HDFS.

2010-04-05 Thread Jean-Daniel Cryans
Look at your datanode logs around the same time. You probably either have this

http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5

or that

http://wiki.apache.org/hadoop/Hbase/FAQ#A6

Also you see to be putting a fair number of regions on those region
servers judging by the metrics, do consider setting HBASE_HEAP higher
than 1GB in conf/hbase-env.sh

J-D

On Mon, Apr 5, 2010 at 8:38 PM, steven zhuang
steven.zhuang.1...@gmail.com wrote:
 greetings,

        while I was importing data into my HBase Cluster, I found one
 regionserver is down, and by check the log, I found following exceptoin:
 *EOFException*(during HBase flush memstore to HDFS file? not sure)

        seems that it's caused by DFSClient not working, I don't know the
 exact reason, maybe it's caused by the heavy load on the machine where the
 datanode is residing on, or the disk is full. but I am not sure which DFS
 node should I check.
        has anybody met the same problem? any pointer or hint is
 appreciated.

       The log is as follows:


 2010-04-06 03:04:34,065 INFO org.apache.hadoop.hbase.regionserver.HRegion:
 Blocking updates for 'IPC Server handler 20 on 60020' on region
 hbt2table16,,1270522012397: memstore size 128.0m is = than blocking 128.0m
 size
 2010-04-06 03:04:34,712 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 34; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/34/854678344516838047; store
 size is 2.9m
 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 35: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:34,715 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 5 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:35,055 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/184/1530971405029654438,
 entries=1489, sequenceid=2914917785, memsize=203.8k, filesize=88.6k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,442 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 35; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/35/2952180521700205032; store
 size is 2.9m
 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 36: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:35,445 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 4 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:35,469 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/185/1984548574711437130,
 entries=2105, sequenceid=2914917785, memsize=286.7k, filesize=123.9k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,711 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Added
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/186/2470661482474884005,
 entries=3031, sequenceid=2914917785, memsize=414.0k, filesize=179.1k to
 hbt2table16,,1270522012397
 2010-04-06 03:04:35,866 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
 started.  Attempting to free 20853136 bytes
 2010-04-06 03:04:37,010 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
 completed. Freed 20866928 bytes.  Priority Sizes: Single=17.422821MB
 (18269152), Multi=150.70126MB (158021728),Memory=0.0MB (0)
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-6935524980745310745_1391901
 2010-04-06 03:04:37,607 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Completed compaction of 36; new storefile is
 hdfs://rra-03:8887hbase/hbt2table16/2144402082/36/1570089400510240916; store
 size is 2.9m
 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Compaction size of 37: 2.9m; Skipped 0 file(s), size: 0
 2010-04-06 03:04:37,612 DEBUG org.apache.hadoop.hbase.regionserver.Store:
 Started compaction of 4 file(s)  into
 hbase/hbt2table16/compaction.dir/2144402082, seqid=2914432737
 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.*EOFException*
 2010-04-06 03:04:37,964 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_2467598422201289982_1391902
 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:43,568 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-2065206049437531800_1391902
 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
 createBlockOutputStream java.io.EOFException
 2010-04-06 03:04:44,044 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_-3059563223628992257_1391902
 2010-04-06 03:05:01,588 WARN org.apache.hadoop.hdfs.DFSClient: 

Re: More about LogFlusher

2010-04-02 Thread Jean-Daniel Cryans
LogFlusher isn't doing any FS operations, it calls HLog that does them.

HLog calls sync on the SequenceFile.Writer, which is a marker, and
also calls fsSync if you patched your hadoop (and replaced the hbase's
hadoop jar with the patched one) with HDFS-200,826,142.

J-D

On Fri, Apr 2, 2010 at 2:05 AM, ChingShen chingshenc...@gmail.com wrote:
 Hi,

  Does anyone know more about the
 org.apache.hadoop.hbase.regionserver.LogFlusher?
  I don't know why does it just invoke SequenceFile.Writer.sync()? It just
 writes a maker into the file.
 Can anyone explain it to me please?

 Thanks.

 Shen



Re: Failed to create /hbase.... KeeperErrorCode = ConnectionLoss for /hbase

2010-04-01 Thread Jean-Daniel Cryans
If the master doesn't shut down, it means it's waiting on something...
you looked at the logs?

You say you ran ./jps ... did you install that in the local directory?
Also what do you mean it didn't work as well? What didn't work? The
command didn't return anything or the HMaster process wasn't listed?

Also did you check the zookeeper logs like Patrick said? You should
see in there when the master tries to connect, and you should see why
it wasn't able to do so.

To help you I need more data about your problem.

J-D

On Thu, Apr 1, 2010 at 11:39 AM, jayavelu jaisenthilkumar
joysent...@gmail.com wrote:
 Hi Daniel,
                   I removed the property tags from the hbase-site.xml.

 Same error occurs.

 Also one strange behaviour,  If i give ./stop-hbase.sh , the terminal says
 stopping master 
 and never stopped.

 I couldnt able to ./jps to check the java in this scenario, it didnt work
 aswell.  So I killed the Hmaster start ( ps -ef | grep java)

 Also manually need to kill Hregionserver both on master, slave1 and slave2.

 Any suggestions please...

 Regs,
 senthil
 On 31 March 2010 19:15, Jean-Daniel Cryans jdcry...@apache.org wrote:

 You set the tick time like this:

  property
   namehbase.zookeeper.property.tickTime/name
   value1/value
   descriptionProperty from ZooKeeper's config zoo.cfg.
   The number of milliseconds of each tick.  See
   zookeeper.session.timeout description.
   /description

 1 means HBase has to report to zookeeper every 1 millisecond and if
 for any reason it doesn't after 20ms, the session is expired (!!). I
 recommend using the default value.

 Also you should keep the same config on every node, rsync can do wonders.

 J-D

 On Wed, Mar 31, 2010 at 9:24 AM, jayavelu jaisenthilkumar
 joysent...@gmail.com wrote:
  Hi,
             I am using 1 master and 2 slaves one has password for ssh.
 
  I am using hadoop0.20.1 and hbase0.20.3(direct one not upgraded one)
 
  1)The slave one with password is could not be disabled, i removed the
 whole
  .ssh directory try to ssh-keygen with passwordless phrase, still i am
 asked
  for the password  when i
  ssh localhost
 
  2) I am able to run hadoop and successfuly run the Mapreduce in the
 hadoop
  environment as per the Running Hadoop On Ubuntu Linux (Multi-Node
 Cluster)
  by noel
 
  3) I am now following the tutorial hbase: overview HBase 0.20.3 API
 
  Its not clearly stated as the mulitnode cluster hadoop for the
 distributed
  mode hbase.
 
  I ran the hdfs and the hbase using start-dfs.sh and start-hbase.sh
  respectively.
 
  The master log indicates connection loss on the /hbase :  ( is this hbase
 is
  created by Hbase or should we do to create it again
 
  2010-03-31 16:45:57,850 INFO org.apache.zookeeper.
  ClientCnxn: Attempting connection to server Hadoopserver/
 192.168.1.65:
  2010-03-31 16:45:57,858 INFO org.apache.zookeeper.ClientCnxn: Priming
  connection to java.nio.channels.SocketChannel[connected local=/
  192.168.1.65:43017 remote=Hadoopserver/192.168.1.65:]
  2010-03-31 16:45:57,881 INFO org.apache.zookeeper.ClientCnxn: Server
  connection successful
  2010-03-31 16:45:57,883 WARN org.apache.zookeeper.ClientCnxn: Exception
  closing session 0x0 to sun.nio.ch.selectionkeyi...@11c2b67
  java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
  lim=4 cap=4]
     at
 org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
  2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring
  exception during shutdown input
  java.net.SocketException: Transport endpoint is not connected
     at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
     at
  sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
     at
  org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
  2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring
  exception during shutdown output
  java.net.SocketException: Transport endpoint is not connected
     at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
     at
  sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
     at
  org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
  2010-03-31 16:45:57,933 INFO
 org.apache.hadoop.hbase.master.RegionManager:
  -ROOT- region unset (but not set to be reassigned)
  2010-03-31 16:45:57,934 INFO
 org.apache.hadoop.hbase.master.RegionManager:
  ROOT inserted into regionsInTransition
  2010-03-31 16:45:58,024 DEBUG
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to read

Re: Why did HBase dead after a regionserver stopped.

2010-03-31 Thread Jean-Daniel Cryans
(SocketIOWithTimeout.java:246)
at
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

 2010-03-30 00:58:59,672 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 172.23.51.55:50010, storageID=DS-225596341-172.23.51.55-50010-1261706639224,
 infoPort=50075, ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for
 channel to be ready for write. ch :
 java.nio.channels.SocketChannel[connected local=/172.23.51.55:50010 remote=/
 172.23.51.55:47568]
at
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

 In hbase log, I found
 org.apache.hadoop.hbase.NotServingRegionException: web_info,,1267870002080
at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309)
at
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1896)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 2010-03-31 14:16:39,076 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
 handler 0 on 60020, call openScanner([...@6defe475, startRow=, stopRow=,
 maxVersions=1, caching=10, cacheBlocks=false,
 timeRange=[0,9223372036854775807), families=ALL) from 172.23.52.58:42223:
 error: org.apache.hadoop.hbase.NotServingRegionException:
 web_info,,1267870002080


 2010/3/31 Jean-Daniel Cryans jdcry...@apache.org

 Please provide us with the usuals: Hadoop/HBase version,
 configurations for both, hardware, OS, etc

 Also did you take a look at search.38d.cm3's region server log? Any
 obvious exceptions and if you google search them, can you find the
 solution?

 Thx

 J-D

 On Tue, Mar 30, 2010 at 7:50 PM, 无名氏 sitong1...@gmail.com wrote:
  I contributed a HBase cluster,  and the regionserver list is
  search10a.cm3
  search10b.cm3
  search162a.cm3
  search166a.cm3
  search168a.cm3
  search16a.cm3
  search178a.cm3
  search180a.cm3
  search182a.cm3
  search184a.cm3
  search188a.cm3
  search189a.cm3
  search18b.cm3
  search190a.cm3
  search192a.cm3
  search200t.cm3
  search33d.cm3
  search34c.cm3
  search34d.cm3
  search35c.cm3
  search35d.cm3
  search38d.cm3
  search3a.cm3
  search49a.cm3
  search4a.cm3
  search50a.cm3
  search51a.cm3
  search54b.cm3
  search55b.cm3
  search55d.cm3
  search56b.cm3
  search5a.cm3
  search60a.cm3
  search61a.cm3
  search62a.cm3
  build2.cme
 
  The regionserver search38d.cm3 stopped yestory.
 
  Now I run hbase shell, execute listcommand,  and throwed exception.
 
  NativeException:
 org.apache.hadoop.hbase.client.RetriesExhaustedException:
  Trying to contact region server null for region , row '', but failed
 after 5
  attempts.
  Exceptions:
  org.apache.hadoop.hbase.NotServingRegionException:
  org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
 at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309)
 at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761)
 at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
 at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
  org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
 at
  org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 
  org.apache.hadoop.hbase.NotServingRegionException

Re: Is NotServingRegionException really an Exception?

2010-03-31 Thread Jean-Daniel Cryans
Arber,

If your cluster doesn't recover, it means there's something else going
on. Feel free to start a new thread on this mailing list to discuss
that, posting relevant informations like version, hardware,
configurations and logs.

J-D

On Wed, Mar 31, 2010 at 9:39 AM, Yabo Xu arber.resea...@gmail.com wrote:
 Sorry for interrupting the thread. We also gets the annoying
 NotServingRegionException once in a while ( especially after intensive
 writing), and if it happens, it seems that the only way is to stop all the
 programs and restart HBase.

 Any better way to deal with it?  ( I tried flush operation on the shell, but
 it does not work )

 Or how to avoid this from happening?

 Thanks,
 Arber

 On Wed, Mar 31, 2010 at 11:44 PM, Stack st...@duboce.net wrote:

 I always thought that the throwing of an exception to signal moved
 region was broke if only for the reason that it disturbing to new
 users.  See https://issues.apache.org/jira/browse/HBASE-72

 Would be nice to change it.  I don't think it easy though.  We'd need
 to rig the RPC so calls were enveloped or some such so we could pass
 status messages along with (or instead of) a query results.

 St.Ack


 On Wed, Mar 31, 2010 at 8:06 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  On Wed, Mar 31, 2010 at 11:02 AM, Gary Helmling ghelml...@gmail.com
 wrote:
 
  Well I would still view it as an exceptional condition.  The client
 asked
  for data back from a server that does not own that data.  Sending back
 an
  exception seems like the appropriate response, to me at least.  It's
 just
  an
  exceptional condition that's allowed to happen in favor of the
 optimization
  of caching region locations in memory on the client.
 
  I could see the reporting of the exception being misleading though if
 it's
  being logged at an error or warn level when it's a normal part of
  operations.  What's the logging level of the messages?
 
 
  On Wed, Mar 31, 2010 at 10:51 AM, Al Lias al.l...@gmx.de wrote:
 
   Am 31.03.2010 16:47, schrieb Gary Helmling:
NotServingRegionException is a normal part of operations when
 regions
transition (ie due to splits).  It's how the region server signals
 back
   to
the client that it needs to re-lookup the region location in .META.
   (which
is normally cached in memory by the client, so can become stale).
   
I'm sure it can also show up as a symptom of other problems, but if
   you're
not seeing any other issues, then it's nothing to be concerned
 about.
   
  
   Thx Gary,
  
          this is my point: I see this many times in the (production)
 logs
   when
   it is actually nothing to worry about. Should'nt this rather be a
 normal
   response of a region server, instead an Exception?
  
   Al
  
   
On Wed, Mar 31, 2010 at 7:38 AM, Al Lias al.l...@gmx.de wrote:
   
As I do see this Exception really often in our logs. I wonder if
 this
indicates a regular thing (within splits etc) or if this is
 something
that should not normally happen.
   
I see it often in Jira as a reason for something else that fails,
 but
for a regular client request, where the client not perfectly
  up-to-date
with region information it looks as something normal. Am I right
 here?
   
   
Al
   
   
  
  
 
  The LDAP api's throw a ReferralException when you try to update a read
 only
  slave, so heir is a precedence for that. But true that an exception may
 be
  strong for something that is technically a warning.
 




Re: Failed to create /hbase.... KeeperErrorCode = ConnectionLoss for /hbase

2010-03-31 Thread Jean-Daniel Cryans
You set the tick time like this:

 property
   namehbase.zookeeper.property.tickTime/name
   value1/value
   descriptionProperty from ZooKeeper's config zoo.cfg.
   The number of milliseconds of each tick.  See
   zookeeper.session.timeout description.
   /description

1 means HBase has to report to zookeeper every 1 millisecond and if
for any reason it doesn't after 20ms, the session is expired (!!). I
recommend using the default value.

Also you should keep the same config on every node, rsync can do wonders.

J-D

On Wed, Mar 31, 2010 at 9:24 AM, jayavelu jaisenthilkumar
joysent...@gmail.com wrote:
 Hi,
            I am using 1 master and 2 slaves one has password for ssh.

 I am using hadoop0.20.1 and hbase0.20.3(direct one not upgraded one)

 1)The slave one with password is could not be disabled, i removed the whole
 .ssh directory try to ssh-keygen with passwordless phrase, still i am asked
 for the password  when i
 ssh localhost

 2) I am able to run hadoop and successfuly run the Mapreduce in the hadoop
 environment as per the Running Hadoop On Ubuntu Linux (Multi-Node Cluster)
 by noel

 3) I am now following the tutorial hbase: overview HBase 0.20.3 API

 Its not clearly stated as the mulitnode cluster hadoop for the distributed
 mode hbase.

 I ran the hdfs and the hbase using start-dfs.sh and start-hbase.sh
 respectively.

 The master log indicates connection loss on the /hbase :  ( is this hbase is
 created by Hbase or should we do to create it again

 2010-03-31 16:45:57,850 INFO org.apache.zookeeper.
 ClientCnxn: Attempting connection to server Hadoopserver/192.168.1.65:
 2010-03-31 16:45:57,858 INFO org.apache.zookeeper.ClientCnxn: Priming
 connection to java.nio.channels.SocketChannel[connected local=/
 192.168.1.65:43017 remote=Hadoopserver/192.168.1.65:]
 2010-03-31 16:45:57,881 INFO org.apache.zookeeper.ClientCnxn: Server
 connection successful
 2010-03-31 16:45:57,883 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x0 to sun.nio.ch.selectionkeyi...@11c2b67
 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
 lim=4 cap=4]
    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
 2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring
 exception during shutdown input
 java.net.SocketException: Transport endpoint is not connected
    at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
    at
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
    at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
    at
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
 2010-03-31 16:45:57,885 WARN org.apache.zookeeper.ClientCnxn: Ignoring
 exception during shutdown output
 java.net.SocketException: Transport endpoint is not connected
    at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
    at
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
    at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
    at
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
 2010-03-31 16:45:57,933 INFO org.apache.hadoop.hbase.master.RegionManager:
 -ROOT- region unset (but not set to be reassigned)
 2010-03-31 16:45:57,934 INFO org.apache.hadoop.hbase.master.RegionManager:
 ROOT inserted into regionsInTransition
 2010-03-31 16:45:58,024 DEBUG
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to read:
 org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /hbase/master
 2010-03-31 16:45:58,422 INFO org.apache.zookeeper.ClientCnxn: Attempting
 connection to server Hadoopclient1/192.168.1.2:
 2010-03-31 16:45:58,423 INFO org.apache.zookeeper.ClientCnxn: Priming
 connection to java.nio.channels.SocketChannel[connected local=/
 192.168.1.65:51219 remote=Hadoopclient1/192.168.1.2:]
 2010-03-31 16:45:58,423 INFO org.apache.zookeeper.ClientCnxn: Server
 connection successful
 2010-03-31 16:45:58,436 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x0 to sun.nio.ch.selectionkeyi...@17b6643
 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
 lim=4 cap=4]
    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
 2010-03-31 16:45:58,437 WARN org.apache.zookeeper.ClientCnxn: Ignoring
 exception during shutdown input
 java.net.SocketException: Transport endpoint is not connected
    at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
    at
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
    at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
    at
 

Re: Region spiting, compaction and merging

2010-03-31 Thread Jean-Daniel Cryans
Hey Michal!

Currently there's no tool you can use except cron, you can request a
major compaction on a table by doing something like: echo
major_compact 'some_table' | /path/to/hbase/bin/hbase shell

You can merge regions using the merge tool but it must be run while
HBase is down. You can run it like that: bin/hbase
org.apache.hadoop.hbase.util.Merge

Enabling compression on that table will allow it to stay small, use
LZO (see the wiki).

J-D

2010/3/31 Michał Podsiadłowski podsiadlow...@gmail.com:
 Hi hbase fans

 We started our cluster (Hbase trunk + CHD3 with hbase dedicated
 patches)  on production environment and we left it running now for 2
 days. Everything is working nice but we didn't try to brake it yet as
 we did previously ;)
 Still there are few things that concerns me.
 We have one table where there is only few rows - around 200 x few tens
 of KB which is updates quite frequently - all records few times an
 hour - sounds trivial but it's keep growing and splitting.
 Currently after 2 days there are 177 records kept in 4 regions what
 IMHO is not good. I had to run manually major compaction to get rid of
 invalidated data (from around 500MB to 0MB and few in memStore
 according to UI).
 As far as can see in the logs there were no major compactions since we
 started 2 days ago. Question is - it it normal that tables grows so
 quickly and due to being stuffed with garbage they are spited?
 Secondly is there a way to force hbase to perform major compaction at
 some particular period - i.e 5 a.m, so it doesn't generate unnecessary
 load during hot periods like in the evening where there is a strong
 demand for performance? Or maybe I am exaggerating the problem and
 influence on the whole system is negligible?

 As third is there a way to merge split regions? As far as i can see
 there is https://issues.apache.org/jira/browse/HBASE-420 which is
 minor issue.

 Cheers,
 Michal



Re: web interface is fragile?

2010-03-31 Thread Jean-Daniel Cryans
Dave,

Can you pastebin the exact error that was returned by the MR job? That
looks like it's client-side (from HBase point of view).

WRT the .META. and the master, the web page does do a request on every
hit so if the region is unavailable then you can't see it. Looks like
you kill -9'ed the region server? If so, it takes a minute to detect
the region server failure and then split the write-ahead-logs so if
.META. was on that machine, it will take that much time to have a
working web page.

Instead of kill -9, simply go on the node and run
./bin/hbase-daemon.sh stop regionserver

J-D

On Wed, Mar 31, 2010 at 5:51 PM, Buttler, David buttl...@llnl.gov wrote:
 Hi,
 I have a small cluster (6 nodes, 1 master and 5 region server/data nodes).  
 Each node has lots of memory and disk (16GB of heap dedicated to 
 RegionServers), 4 TB of disk per node for hdfs.
 I have a table with about 1 million rows in hbase - that's all.  Currently it 
 is split across 50 regions.
 I was monitoring this with the hbase web gui and I noticed that a lot of the 
 heap was being used (14GB).  I was running a MR job and I was getting an 
 error to the console that launched the job:
 Error: GC overhead limit exceeded hbase

 First question: is this going to hose the whole system?  I didn't see the 
 error in any of the hbase logs, so I assume that it was purely a client issue.

 So, naively thinking that maybe the GC had moved everything to permgen and 
 just wasn't cleaning up, I thought I would do a rolling restart of my region 
 servers and see if that cleared everything up.  The first server I killed 
 happened to be the one that was hosting the .META. table.  Subsequently the 
 web gui failed.  Looking at the errors, it seems that the web gui essentially 
 caches the address for the meta table and blindly tries connecting on every 
 request.  I suppose I could restart the master, but this does not seem like 
 desirable behavior.  Shouldn't the cache be refreshed on error?  And since 
 there is no real code for the GUI, just a jsp page, doesn't this mean that 
 this behavior could be seen in other applications that use HMaster?

 Corrections welcome
 Dave




Re: Data size

2010-03-31 Thread Jean-Daniel Cryans
HBase is column-oriented; every cell is stored with the row, family,
qualifier and timestamp so every pieces of data will bring a larger
disk usage. Without any knowledge of your keys, I can't comment much
more.

Then HDFS keeps a trash so every file compacted will end up there...
if you just did the import, there will be a lot of these.

Finally if you imported the data more than once, hbase keeps 3
versions by default.

So in short, is it reasonable? Answer: it depends!

J-D

2010/3/31  y_823...@tsmc.com:
 Hi,

 We've dumped oracele data to files then put these files into different
 hbase table.
 The size of these files is 35G; we saw the HDFS usage up to 562G after
 putting it into hbase.
 Is that reasonable?
 Thanks



 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)


  ---
 TSMC PROPERTY
  This email communication (and any attachments) is proprietary information
  for the sole use of its
  intended recipient. Any unauthorized review, use or distribution by anyone
  other than the intended
  recipient is strictly prohibited.  If you are not the intended recipient,
  please notify the sender by
  replying to this email, and then delete this email and any copies of it
  immediately. Thank you.
  ---






Re: web interface is fragile?

2010-03-31 Thread Jean-Daniel Cryans
The fact we see the exception 10 times means that
getRegionServerWithRetries got that error 10 times before
abandoning... Are you sure you don't see that on the region server's
log located at 10.0.1.3?

Thx,

J-D

On Wed, Mar 31, 2010 at 6:26 PM, Buttler, David buttl...@llnl.gov wrote:
 Hi J-D,
 Thanks for taking a look at this.  The error that I received is:
 http://pastebin.com/ZnhVA5B0
 This is the client side.
 I little strange as I have been running this task several times in the past, 
 and my client heap size is set to 4GB.  I can try doubling it and see if that 
 helps
 Dave


 -Original Message-
 From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel 
 Cryans
 Sent: Wednesday, March 31, 2010 6:11 PM
 To: hbase-user@hadoop.apache.org
 Subject: Re: web interface is fragile?

 Dave,

 Can you pastebin the exact error that was returned by the MR job? That
 looks like it's client-side (from HBase point of view).

 WRT the .META. and the master, the web page does do a request on every
 hit so if the region is unavailable then you can't see it. Looks like
 you kill -9'ed the region server? If so, it takes a minute to detect
 the region server failure and then split the write-ahead-logs so if
 .META. was on that machine, it will take that much time to have a
 working web page.

 Instead of kill -9, simply go on the node and run
 ./bin/hbase-daemon.sh stop regionserver

 J-D

 On Wed, Mar 31, 2010 at 5:51 PM, Buttler, David buttl...@llnl.gov wrote:
 Hi,
 I have a small cluster (6 nodes, 1 master and 5 region server/data nodes).  
 Each node has lots of memory and disk (16GB of heap dedicated to 
 RegionServers), 4 TB of disk per node for hdfs.
 I have a table with about 1 million rows in hbase - that's all.  Currently 
 it is split across 50 regions.
 I was monitoring this with the hbase web gui and I noticed that a lot of the 
 heap was being used (14GB).  I was running a MR job and I was getting an 
 error to the console that launched the job:
 Error: GC overhead limit exceeded hbase

 First question: is this going to hose the whole system?  I didn't see the 
 error in any of the hbase logs, so I assume that it was purely a client 
 issue.

 So, naively thinking that maybe the GC had moved everything to permgen and 
 just wasn't cleaning up, I thought I would do a rolling restart of my region 
 servers and see if that cleared everything up.  The first server I killed 
 happened to be the one that was hosting the .META. table.  Subsequently the 
 web gui failed.  Looking at the errors, it seems that the web gui 
 essentially caches the address for the meta table and blindly tries 
 connecting on every request.  I suppose I could restart the master, but this 
 does not seem like desirable behavior.  Shouldn't the cache be refreshed on 
 error?  And since there is no real code for the GUI, just a jsp page, 
 doesn't this mean that this behavior could be seen in other applications 
 that use HMaster?

 Corrections welcome
 Dave





Re: Error Page on wiki?

2010-03-30 Thread Jean-Daniel Cryans
Currently we have http://wiki.apache.org/hadoop/Hbase/FAQ and
http://wiki.apache.org/hadoop/Hbase/Troubleshooting

Feel free to improve it!

J-D

On Tue, Mar 30, 2010 at 4:11 PM, Buttler, David buttl...@llnl.gov wrote:
 Is there an error page on the wiki listing stack traces hbase users see, and 
 associating them with potential causes?  I browsed around but didn't see it.  
 It would be nice to capture some of the knowledge that gets distributed on 
 the mailing list, and it would really help me to understand if the errors I 
 am seeing have a known cause or if I am seeing something new.  I will be 
 happy to contribute my errors and solutions as soon as they are available :)


 Thanks,
 Dave




Re: Why did HBase dead after a regionserver stopped.

2010-03-30 Thread Jean-Daniel Cryans
Please provide us with the usuals: Hadoop/HBase version,
configurations for both, hardware, OS, etc

Also did you take a look at search.38d.cm3's region server log? Any
obvious exceptions and if you google search them, can you find the
solution?

Thx

J-D

On Tue, Mar 30, 2010 at 7:50 PM, 无名氏 sitong1...@gmail.com wrote:
 I contributed a HBase cluster,  and the regionserver list is
 search10a.cm3
 search10b.cm3
 search162a.cm3
 search166a.cm3
 search168a.cm3
 search16a.cm3
 search178a.cm3
 search180a.cm3
 search182a.cm3
 search184a.cm3
 search188a.cm3
 search189a.cm3
 search18b.cm3
 search190a.cm3
 search192a.cm3
 search200t.cm3
 search33d.cm3
 search34c.cm3
 search34d.cm3
 search35c.cm3
 search35d.cm3
 search38d.cm3
 search3a.cm3
 search49a.cm3
 search4a.cm3
 search50a.cm3
 search51a.cm3
 search54b.cm3
 search55b.cm3
 search55d.cm3
 search56b.cm3
 search5a.cm3
 search60a.cm3
 search61a.cm3
 search62a.cm3
 build2.cme

 The regionserver search38d.cm3 stopped yestory.

 Now I run hbase shell, execute listcommand,  and throwed exception.

 NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
 Trying to contact region server null for region , row '', but failed after 5
 attempts.
 Exceptions:
 org.apache.hadoop.hbase.NotServingRegionException:
 org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 org.apache.hadoop.hbase.NotServingRegionException:
 org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 org.apache.hadoop.hbase.NotServingRegionException:
 org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 org.apache.hadoop.hbase.NotServingRegionException:
 org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 org.apache.hadoop.hbase.NotServingRegionException:
 org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1761)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)


        from org/apache/hadoop/hbase/client/HConnectionManager.java:1002:in
 `getRegionServerWithRetries'
        from org/apache/hadoop/hbase/client/MetaScanner.java:55:in
 `metaScan'
        from 

Re: Short DNS outage leads to No .META. found

2010-03-29 Thread Jean-Daniel Cryans
This was fixed in https://issues.apache.org/jira/browse/HBASE-2174,
will be available in 0.20.4 (or you can patch it on your 0.20.3,
should apply easily).

J-D

On Mon, Mar 29, 2010 at 3:58 AM, Al Lias al.l...@gmx.de wrote:
 We have a DNS installation that has a HA-Logic, that may fail for say 10
 seconds.

 In such a case we experience the following:

 * DNS goes down
 * The Master gets this: Received report from unknown server -- telling
 it to MSG_CALL_SERVER_STARTUP (Probably the IP is unknown)
 * The Regionservers do as directed, zookeeper logs state that /hbase/rs/
 nodes are updated
 * DNS goes up

 Now there is no or a wrong master selection and no region can be served
 anymore. Also, no other MSG_CALL_SERVER_STARTUP appear, which could
 reanimate the cluster...

 We use host names in the regionservers file.

 What could we change to be more robust against such a problem?

 Thx,

   Al



Re: Zookeeper session lost

2010-03-29 Thread Jean-Daniel Cryans
I see

2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
79410ms, ten times longer than scheduled
: 5000
2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
78781ms, ten times longer than scheduled
: 3000

That means a sleeping thread slept for, in the first case, 79 seconds
instead of 5. This is due to a garbage collection by the JVM, aka
pause of the world. Since your region server stopped answering for
that long (more than the default timeout of 60 seconds), it was
considered dead and when it figured it it shut down itself to stop
serving the regions since the may already be served by another region
server (this is why it doesn't retry to connect).

This mailing list has quite a few threads about resolving that kind of
problem, I suggest searching the archives (you will mainly learn about
giving more than the default 1GB of heap size to HBase, to make sure
you don't swap and to not CPU starve your region servers).

J-D

On Mon, Mar 29, 2010 at 4:27 AM, Peter Falk pe...@bugsoft.nu wrote:
 Hi,

 One of our region servers was shut down with the following messages in the
 log. It seems like communication with the zookeeper timed out and when it
 later reconnected, the session was expired and the region server then shut
 itself down. Seem strange to me that it should shut down, why did it not try
 to create a new session instead? Any ideas of how to prevent similar
 problems in the future?

 2010-03-28 20:24:27,432 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x278bd16a96000f to sun.nio.
 ch.selectionkeyi...@355811ec
 java.io.IOException: TIMED OUT
        at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
 2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
 79410ms, ten times longer than scheduled
 : 5000
 2010-03-28 20:24:27,439 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
 78781ms, ten times longer than scheduled
 : 3000
 2010-03-28 20:24:27,433 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x278bd16a96000d to sun.nio.ch.selectionkeyi...@2927fa12
 java.io.IOException: TIMED OUT
        at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
 2010-03-28 20:24:28,291 INFO org.apache.zookeeper.ClientCnxn: Attempting
 connection to server michelob/192.168.10.48:2181
 2010-03-28 20:24:28,291 INFO org.apache.zookeeper.ClientCnxn: Priming
 connection to java.nio.channels.SocketChannel[connected local=/
 192.168.10.47:36626 remote=michelob/192.168.10.48:2181]
 2010-03-28 20:24:28,292 INFO org.apache.zookeeper.ClientCnxn: Server
 connection successful
 2010-03-28 20:24:28,292 WARN org.apache.zookeeper.ClientCnxn: Exception
 closing session 0x278bd16a96000d to sun.nio.
 ch.selectionkeyi...@3544d65e
 java.io.IOException: Session Expired
        at
 org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
        at
 org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
        at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
 2010-03-28 20:24:28,293 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
 expired

 TIA,
 Peter



Re: Region assignment in Hbase

2010-03-29 Thread Jean-Daniel Cryans
Inline.

J-D

On Mon, Mar 29, 2010 at 11:45 AM, john smith js1987.sm...@gmail.com wrote:
 Hi all,

 I read the issue HBase-57 ( https://issues.apache.org/jira/browse/HBASE-57 )
 . I don't really understand the use of assigning regions keeping DFS in
 mind. Can anyone give an example usecase showing its advantages

A region is composed of files, files are composed of blocks. To read
data, you need to fetch those blocks. In HDFS you normally have access
to 3 replicas and you fetch one of them over the network. If one of
the replica is on the local datanode, you don't need to go through the
network. This means less network traffic and better response time.

 Can
 map-reduce exploit it's advantage in any way (if data is distributed in the
 above manner)  or is it just the read-write performance that gets improved .

MapReduce works in the exact same way, it always tries to put the
computation next to where the data is. I recommend reading the
MapReduce tutorial
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Overview

 Can some one please help me in understanding this.

 Regards
 JS



Re: RegionServer Aborting

2010-03-26 Thread Jean-Daniel Cryans
Your region server log is missing the reason for the abort, but if you
had the following error in the DN log then it probably means that the
RS aborted because it wasn't able to write into HDFS. Since HBase
doesn't have any insight into why it's not able to contact a DN, it
prefers the paranoid way and shuts itself down.

If you search the mailing lists for that error, you will probably
stumble upon the following configuration:

property
  namedfs.datanode.socket.write.timeout/name
  value0/value
/property

This is set in hdfs-site.xml, it's a config I personally use and I
never saw that problem on my clusters since.

Hope this helps,

J-D

2010/3/26  y_823...@tsmc.com:
 HDFS log

 java.net.SocketTimeoutException: 48 millis timeout while waiting for
 channel to be ready for write. ch :
 java.nio.channels.SocketChannel[connected local=/10.81.47.50:50010
 remote=/10.81.47.35:34325]
  at
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
  at
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
  at
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
  at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
  at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
  at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
  at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
  at java.lang.Thread.run(Thread.java:619)

 2010-03-26 15:53:30,910 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(10.81.47.50:50010,
 storageID=DS-758373957-10.81.47.50-50010-1264018078483, infoPort=50075,
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for
 channel to be ready for write. ch :
 java.nio.channels.SocketChannel[connected local=/10.81.47.50:50010
 remote=/10.81.47.35:34325]
  at
 org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
  at
 org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
  at
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
  at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
  at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
  at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
  at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
  at java.lang.Thread.run(Thread.java:619)




 Fleming Chiu(邱宏明)
 707-6128
 y_823...@tsmc.com
 週一無肉日吃素救地球(Meat Free Monday Taiwan)





  y_823...@tsmc.com
   To:  
 hbase-user@hadoop.apache.org
  2010/03/26 05:06 cc:  (bcc: Y_823910/TSMC)
  PM   Subject: RegionServer Aborting
  Please respond to
  hbase-user






 Hi,

 I didn't send any command to shutdown my region server,
 so I don't know why my region server shutdown automatically?
 Any ideas?


 HBase version : 0.20.2, r834515

 Hadoop version:  0.20.1, r810220

 2010-03-26 15:56:59,330 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
 10.81.47.50:60020
 2010-03-26 15:57:01,797 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
 2010-03-26 15:57:01,797 INFO org.apache.zookeeper.ZooKeeper: Closing
 session: 0x1279807e42c0003
 2010-03-26 15:57:01,797 INFO org.apache.zookeeper.ClientCnxn: Closing
 ClientCnxn for session: 0x1279807e42c0003
 2010-03-26 15:57:01,800 INFO org.apache.zookeeper.ClientCnxn: Exception
 while closing send thread for session 0x1279807e42c0003 : Read error rc =
 -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
 2010-03-26 15:57:01,915 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
 ClientCnxn for session: 0x1279807e42c0003
 2010-03-26 15:57:01,915 INFO org.apache.zookeeper.ZooKeeper: Session:
 0x1279807e42c0003 closed
 2010-03-26 15:57:01,915 DEBUG
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with
 ZooKeeper
 2010-03-26 15:57:01,915 INFO org.apache.zookeeper.ClientCnxn: EventThread
 shut down
 2010-03-26 15:57:02,024 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 regionserver/10.81.47.50:60020 exiting
 2010-03-26 15:57:06,669 INFO org.apache.hadoop.hbase.Leases:
 regionserver/10.81.47.50:60020.leaseChecker closing leases
 2010-03-26 15:57:06,669 INFO org.apache.hadoop.hbase.Leases:
 regionserver/10.81.47.50:60020.leaseChecker closed leases
 2010-03-26 15:57:06,670 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
 thread.
 2010-03-26 15:57:06,670 INFO
 

Re: Cannot open filename Exceptions

2010-03-25 Thread Jean-Daniel Cryans
4 CPUs seems ok, unless you are running 2-3 MR tasks at the same time.

So your value for the timeout is 24, but did you change the tick
time? The GC pause you got seemed to last almost a minute which, if
you did not change the tick value, matches 3000*20 (disregard your
session timeout).

J-D

On Thu, Mar 25, 2010 at 1:07 AM, Zheng Lv lvzheng19800...@gmail.com wrote:
 Hello J-D,
  Thank you for your reply first.
  How many CPUs do you have?
  Every server has 2 Dual-Core cpus.
  Are you swapping?
  Now I'm not sure about it with our monitor tools, but now we have written
 a script to record vmstat log every 2 seconds. If something wrong happen
 again, we can take it.
  Also if the only you are using this system currently to batch load
  data or as an analytics backend, you probably want to set the timeout
  higher:
  But our value of this property is already 24.

  We will try to optimize our garbage collector and we will see what will
 happen.
  Thanks again, J-D,
    LvZheng

 2010/3/25 Jean-Daniel Cryans jdcry...@apache.org

 2010-03-24 11:33:52,331 WARN org.apache.hadoop.hbase.util.Sleeper: We
 slept 54963ms, ten times longer than scheduled: 3000

 You had an important garbage collector pause (aka pause of the world
 in java-speak) and your region server's session with zookeeper expired
 (it literally stopped responding for too long, so long it was
 considered dead). Are you swapping? How many CPUs do you have? If you
 are slowing down the garbage collecting process, it will take more
 time.

 Also if the only you are using this system currently to batch load
 data or as an analytics backend, you probably want to set the timeout
 higher:

  property
    namezookeeper.session.timeout/name
    value6/value
    descriptionZooKeeper session timeout.
      HBase passes this to the zk quorum as suggested maximum time for a
      session.  See

 http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
      The client sends a requested timeout, the server responds with the
      timeout that it can give the client. The current implementation
      requires that the timeout be a minimum of 2 times the tickTime
      (as set in the server configuration) and a maximum of 20 times
      the tickTime. Set the zk ticktime with
 hbase.zookeeper.property.tickTime.
      In milliseconds.
    /description
  /property

 This value can only be 20 times bigger than this:

  property
    namehbase.zookeeper.property.tickTime/name
    value3000/value
    descriptionProperty from ZooKeeper's config zoo.cfg.
    The number of milliseconds of each tick.  See
    zookeeper.session.timeout description.
    /description
  /property


 So you could set tick to 6000, timeout to 12 for a 2min timeout.



Re: Which Hadoop and Hbase for stability

2010-03-25 Thread Jean-Daniel Cryans
 I meant hadoop, in hbase svn structure is obvious.

Doh! Hadoop 0.20 is pre-split so the whole thing is there
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20/

 They are. We've also backported other patches that are in 0.21. CDH3 is
 easier than having to deal with applying your own patches and building
 Hadoop yourself.

 So you advice clean cdh3 release or should i apply also patches from
  http://archive.cloudera.com/cdh-3-dev-builds/hbase/ and build it myself?
 As far as i can see changes done by cloudera don't include patches
 from cdh-3-dev-builds/hbase?

I'll let them answer.


 Thanks,
 MP



Re: ported lucandra: lucene index on HBase

2010-03-25 Thread Jean-Daniel Cryans
That sounds great Thomas! You can start by adding an entry here
http://wiki.apache.org/hadoop/SupportingProjects

WRT becoming an HBase contrib, we have a rule that at least one
committer (or a very active contributor) must be in charge and be
available to fix anything broken in it due to changes in core HBase.
For example, if a contrib doesn't compile before a release, we will
exclude it.

J-D

On Thu, Mar 25, 2010 at 2:42 AM, Thomas Koch tho...@koch.ro wrote:
 Hi,

 Lucandra stores a lucene index on cassandra:
 http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend

 As the author of lucandra writes: I’m sure something similar could be built
 on hbase.

 So here it is:
 http://github.com/thkoch2001/lucehbase

 This is only a first prototype which has not been tested on anything real yet.
 But if you're interested, please join me to get it production ready!

 I propose to keep this thread on hbase-user and java-dev only.
 Would it make sense to aim this project to become an hbase contrib? Or a
 lucene contrib?

 Best regards,

 Thomas Koch, http://www.koch.ro



Re: Bulk import, HFiles, Multiple reducers and TotalOrderPartitioner

2010-03-25 Thread Jean-Daniel Cryans
Ruslan,

I see you did all the required homework but this mail is really hard
to read. Can you create a jira
(http://issues.apache.org/jira/browse/HBASE) and attach all the code?
This will also make it easier to track.

thx!

J-D

On Wed, Mar 24, 2010 at 6:02 PM, Ruslan Salyakhov rusla...@gmail.com wrote:
 Hi!

 I'm trying to use bulk import that writing HFiles directly into HDFS and
 have a problem with multiple reducers. If I run MR to prepare HFIles with
 more than one reducer then some values for keys are not appeared in the
 table after loadtable.rb script execution. With one reducer everything works
 fine. Let's take a look at details:

 Environment:
 - Hadoop 0.20.1
 - HBase release 0.20.3

 http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
 - the row id must be formatted as a ImmutableBytesWritable
 - MR job should ensure a total ordering among all keys

 http://issues.apache.org/jira/browse/MAPREDUCE-366 (patch-5668-3.txt)
 - TotalOrderPartitioner that uses the new API

 https://issues.apache.org/jira/browse/HBASE-2063
 - patched HFileOutputFormat

 Sample data of my keys:
 1.3.SWE.AB.-1.UPPLANDS-VASBY.1.1.0.1
 1.306.CAN.ON.-1.LONDON.1.1.0.1
 1.306.USA.CO.751.FT COLLINS.1.1.1.0
 1.306.USA.CO.751.LITTLETON.1.1.1.0
 4.6.USA.TX.623.MUENSTER.1.1.0.0
 4.67.USA.MI.563.GRAND RAPIDS.1.1.0.0
 4.68.USA.CT.533.WILLINGTON.1.1.1.0
 4.68.USA.LA.642.LAFAYETTE.1.1.1.0
 4.9.USA.CT.501.STAMFORD.1.1.0.0
 4.9.USA.NJ.504.PRINCETON.1.1.0.1
 4.92.USA.IN.527.INDIANAPOLIS.1.1.0.0

 I've put everything together:

 1) Test of TotalOrderPartitioner that checks how it works with my keys.
 note that I've set up my comparator to pass that test
 conf.setClass(mapred.output.key.comparator.class, MyKeyComparator.class,
 Object.class);

 import java.io.IOException;
 import java.util.ArrayList;

 import junit.framework.TestCase;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.io.NullWritable;
 import org.apache.hadoop.io.SequenceFile;
 import org.apache.hadoop.io.WritableComparable;
 import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;

 public class TestTotalOrderPartitionerForHFileKeys extends TestCase {

    private static final ImmutableBytesWritable[] splitKeys = new
 ImmutableBytesWritable[] {
            // -inf
                    // 0
            new
 ImmutableBytesWritable(Bytes.toBytes(0.27.USA.OK.650.FAIRVIEW.1.1.0.1)),
        // 1
            new
 ImmutableBytesWritable(Bytes.toBytes(0.430.USA.TX.625.Rollup.1.1.0.0)),
        // 2
            new ImmutableBytesWritable(Bytes.toBytes(0.9.USA.NY.501.NEW
 YORK.1.1.0.0)),         // 3
            new
 ImmutableBytesWritable(Bytes.toBytes(1.103.USA.DC.511.Rollup.1.1.0.0)),
        // 4
            new
 ImmutableBytesWritable(Bytes.toBytes(1.11.CAN.QC.-1.MONTREAL.1.1.1.0)),
        // 5
            new
 ImmutableBytesWritable(Bytes.toBytes(1.220.USA.NC.Rollup.Rollup.1.1.1.0)),
    // 6
            new
 ImmutableBytesWritable(Bytes.toBytes(1.225.USA.Rollup.Rollup.Rollup.1.1.0.1)),//
 7
            new
 ImmutableBytesWritable(Bytes.toBytes(1.245.ZAF.WC.-1.PAROW.1.1.0.1)),
    // 8
            new ImmutableBytesWritable(Bytes.toBytes(1.249.USA.MI.513.BAY
 CITY.1.1.0.0))         // 9
    };

    private static final ArrayListCheckImmutableBytesWritable testKeys =
 new ArrayListCheckImmutableBytesWritable();
    static {
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(0.10.USA.CA.825.SAN DIEGO.1.1.0.1)), 0));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(0.103.FRA.J.-1.PARIS.1.1.0.1)), 0));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(0.3.GBR.SCT.826032.PERTH.1.1.0.1)), 1));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(0.42.GBR.ENG.Rollup.Rollup.1.1.0.1)), 1));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(0.7.USA.CA.807.SANTA CLARA.1.1.0.0)), 2));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(1.10.SWE.AB.-1.STOCKHOLM.1.1.0.0)), 3));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(1.108.ABW.Rollup.Rollup.Rollup.1.1.0.0)), 4));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(1.11.CAN.NB.-1.SACKVILLE.1.1.0.1)), 4));
        testKeys.add(new CheckImmutableBytesWritable(new
 ImmutableBytesWritable(Bytes
                .toBytes(1.11.CAN.Rollup.Rollup.Rollup.1.1.0.0)), 5));
        

Re: Batch query?

2010-03-25 Thread Jean-Daniel Cryans
Not yet: http://issues.apache.org/jira/browse/HBASE-1845

J-D

On Thu, Mar 25, 2010 at 1:29 PM, Geoff Hendrey ghend...@decarta.com wrote:

 Is there  a way to submit multiple Get queries in a batch?


 -geoff




Re: Which Hadoop and Hbase for stability

2010-03-24 Thread Jean-Daniel Cryans
 What about doing the same with hadoop - using trunk?

The reasons were presented at the last HUG, see Jonathan's and Todd's
presentations http://wiki.apache.org/hadoop/HBase/HBasePresentations

 As far as i can see there is no branch for 0.20 or is it simply trunk?

http://svn.apache.org/repos/asf/hadoop/hbase/branches/0.20/

 Why not all patches from Cloudera pack are available in jira?

They usually are or will be AFAIK. I'll let them answer.

 As far as i can see also not all available were applied to repository version.
 Are they not reviewed/accepted?

HDFS-200 for example will never be applied, in hadoop 0.21 and the
next releases we will be depending on HDFS-265. The presentations I
linked give more details. The rest are work in progress or in the same
situation as HDFS-200.


 Which version of hadoop jar I should include into hbase?

The one that you will be using?

 Is this one shipped with trunk appropriated for patched hadoop?

So is the one in branch IIRC.


 Lots of questions ;)

 Thanks for help
 Michal


 2010/3/24 Jean-Daniel Cryans jdcry...@apache.org:
 0.20.4 will contain the necessary improvements required to use
 HDFS-200 in an efficient way, so instead of starting on 0.20.3 you
 should instead checkout the head of the 0.20 branch.

 Currently there's no released hadoop version for HDFS-200, but
 Cloudera made it public that CDH3 (or some version of it) will contain
 the necessary hadoop patches for HBase. You can see
 http://archive.cloudera.com/cdh-3-dev-builds/hbase/ for their current
 list of hadoop patches to support HBase.

 I think we should soon release a beta of HBase 0.20.4 since it already
 contains tons of improvements and offers data durability when used
 with HDFS-200.

 J-D

 2010/3/24 Michał Podsiadłowski podsiadlow...@gmail.com:
 Hi Hbase fans,

 I'm trying to prepare as stable version as it is possible of our
 Hbase/Hadoop stack. This involves adding some patches.
 As a base i want to use hbase 0.20.3 and hadoop 0.20.2.
 Which patches I should apply for hdfs and Hbase?

 AFAIK for hadoop
 https://issues.apache.org/jira/browse/HDFS-200
 https://issues.apache.org/jira/browse/HDFS-127
 https://issues.apache.org/jira/browse/HDFS-826

 For hbase
 https://issues.apache.org/jira/browse/HBASE-2244 which we already had
 pleasure to experience ;)

 Is there anything else available which is improves handling disaster
 situations ;)?
 Are there any patches which are only client/server specific?
 Which patches are bundled to hadoop that is shipped with hbase?
 (HDFS-826, HDFS-200 anything else?)
 Is there any hadoop version already patched available and ready to run
 as base for hbase?


 Thanks for help
 Michal





Re: Cannot open filename Exceptions

2010-03-24 Thread Jean-Daniel Cryans
2010-03-24 11:33:52,331 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 54963ms, ten times longer than scheduled: 3000

You had an important garbage collector pause (aka pause of the world
in java-speak) and your region server's session with zookeeper expired
(it literally stopped responding for too long, so long it was
considered dead). Are you swapping? How many CPUs do you have? If you
are slowing down the garbage collecting process, it will take more
time.

Also if the only you are using this system currently to batch load
data or as an analytics backend, you probably want to set the timeout
higher:

  property
namezookeeper.session.timeout/name
value6/value
descriptionZooKeeper session timeout.
  HBase passes this to the zk quorum as suggested maximum time for a
  session.  See
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
  The client sends a requested timeout, the server responds with the
  timeout that it can give the client. The current implementation
  requires that the timeout be a minimum of 2 times the tickTime
  (as set in the server configuration) and a maximum of 20 times
  the tickTime. Set the zk ticktime with hbase.zookeeper.property.tickTime.
  In milliseconds.
/description
  /property

This value can only be 20 times bigger than this:

  property
namehbase.zookeeper.property.tickTime/name
value3000/value
descriptionProperty from ZooKeeper's config zoo.cfg.
The number of milliseconds of each tick.  See
zookeeper.session.timeout description.
/description
  /property


So you could set tick to 6000, timeout to 12 for a 2min timeout.

J-D

On Wed, Mar 24, 2010 at 8:01 PM, Zheng Lv lvzheng19800...@gmail.com wrote:
 Hello Stack,
  Yesterday we got another problem about zookeeper session expired,
 leading rs shutdown, which never happened before.
  I googled it, finding some docs about it, but didnot get things
 really certain about how it happened and how to avoid it.
  Now I have put the corresponding logs to
 http://rapidshare.com/files/367820690/208-0324.log.html.
  Look forward to your reply.
  Thank you.
    LvZheng

 2010/3/24 Zheng Lv lvzheng19800...@gmail.com

 Hello Stack,
   Thank you for your explainations, it's very helpful, Thank you.
   If I get something new, I'll connect you.
   Regards,
     LvZheng

 2010/3/24 Stack st...@duboce.net

  On Tue, Mar 23, 2010 at 8:42 PM, Zheng Lv lvzheng19800...@gmail.com
 wrote:
  Hello Stack,
   So, for sure ugly stuff is going on.  I filed
   https://issues.apache.org/jira/browse/HBASE-2365.  It looks like
 we're
   doubly assigning a region.
   Can you tell me how this happened in detail? Thanks a lot.
 

 Yes.

 Splits are run by the regionserver.  It figures a region needs to be
 split and goes ahead closing the parent and creating the daughter
 regions.  It then adds edits to the meta table offlining the parent
 and inserting the two new daughter regions.  Next it sends a message
 to the master telling it that a region has been split.   The message
 contains names of the daughter regions.  On receipt of the message,
 the master adds the new daughter regions to the unassigned regions
 list so they'll be passed out the next time a regionserver checks in.

 Concurrently, the master is running a scan of the meta table every
 minute making sure all is in order.  One thing it does is if it finds
 unassigned regions, it'll add them to the unassigned regions (this
 process is what gets all regions assigned after a startup).

 In your case, whats happening is that there is a long period between
 the add of the new split regions to the meta table and the report of
 split to the master.  During this time, the master meta scan ran,
 found one of the daughters and went and assigned it.  Then the split
 message came in and the daughter was assigned again!

 There was supposed to be protection against this happening IIRC.
 Looking at responsible code, we are trying to defend against this
 happening in ServerManager:

  /*
   * Assign new daughter-of-a-split UNLESS its already been assigned.
   * It could have been assigned already in rare case where there was a
 large
   * gap between insertion of the daughter region into .META. by the
   * splitting regionserver and receipt of the split message in master (See
   * HBASE-1784).
   * @param hri Region to assign.
   */
  private void assignSplitDaughter(final HRegionInfo hri) {
    MetaRegion mr =
 this.master.regionManager.getFirstMetaRegionForRegion(hri);
    Get g = new Get(hri.getRegionName());
    g.addFamily(HConstants.CATALOG_FAMILY);
    try {
      HRegionInterface server =
        master.connection.getHRegionConnection(mr.getServer());
      Result r = server.get(mr.getRegionName(), g);
      // If size  3 -- presume regioninfo, startcode and server -- then
 presume
      // that this daughter already assigned and return.
      if (r.size() = 3) return;
    } catch 

Re: Why do we need the historian column family in .META. table?

2010-03-23 Thread Jean-Daniel Cryans
That was a family used to keep track of region operations like open,
close, compact, etc. It proved to be more troublesome than handy so we
disabled this feature until coming up with a better solution. The
family stayed for backward compatibility.

J-D

On Tue, Mar 23, 2010 at 6:50 PM, ChingShen chingshenc...@gmail.com wrote:
 Hi,

  I saw a historian column family in .META. table, but in what situation do
 we need the column family? thanks.

 Shen



Re: Adding filter to scan at remote client causes UnknownScannerException

2010-03-22 Thread Jean-Daniel Cryans
Alex,

Good job on finding out your issue, which boils down to our mistake as
hbase devs. 0.20.3 included fixes for the filters and changed their
readFields/write behavior. This should have either 1) not be committed
or 2) we should have bumped the RPC version.

I ran cross-version tests before releasing 0.20.3 but I did not verify
filters. This could probably be automated with our EC2 tests we are
planning (wink Andrew Purtell), eg running all our tests with
different versions of clients and servers.

J-D

On Mon, Mar 22, 2010 at 5:17 AM, Alex Baranov alex.barano...@gmail.com wrote:
 Fond the mistake. It was mine, sorry.

 The error was caused by using HBase 0.20.2 version jar on client end (and
 0.20.3 on the server end). Despite I put proper version in classpath in
 java -jar command (I copied here the java run command previosly), in my
 client app jar the manifest file had a link to hbase 0.20.2 version jar.
 (Btw, this happend because I'm using maven in development and once back I
 added dependency to the HBase 0.20.2 since it was (and *is now*) the only
 one available in public maven repos).


 Thank you for support: your check make me clean-up and re-double-check
 everything.
 Alex.

 On Mon, Mar 22, 2010 at 11:31 AM, Alex Baranov 
 alex.barano...@gmail.comwrote:

 It hangs for some time. I'm not using any contribs.

 Thanks for the help,
 Alex.


 On Fri, Mar 19, 2010 at 11:29 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 Alex,

 I tried your code from a remote machine to a pseudo-distributed setup
 and it worked well (on trunk, didn't have a 0.20.3 setup around). When
 the call fails, doesn't it return right away or it hangs for some
 time? Also are you using any contrib?

 Thx

 J-D

 On Fri, Mar 19, 2010 at 5:28 AM, Alex Baranov alex.barano...@gmail.com
 wrote:
  Hello J-D,
 
  Thanks for helping me out!
 
  Here is the code that works if I run it on machine that has HBase master
 on
  it and doesn't work on remote client box:
 
  // CODE BEGINS
 
     HBaseConfiguration conf = new HBaseConfiguration();
     HTable hTable = new HTable(conf, agg9);
 
     Scan scan = new Scan();
     scan.setStartRow(Bytes.toBytes(qf|byday_bytype_|14656__|));
 
     FilterList filters = new
 FilterList(FilterList.Operator.MUST_PASS_ALL);
 
     SingleColumnValueFilter filter = new
  SingleColumnValueFilter(Bytes.toBytes(agg), Bytes.toBytes(count),
             CompareFilter.CompareOp.GREATER, Bytes.toBytes(35));
     filters.addFilter(filter);
 
     InclusiveStopFilter stopFilter = new
  InclusiveStopFilter(Bytes.toBytes(qf|byday_|14739_|));
     filters.addFilter(stopFilter);
     scan.setFilter(filters);
 
     ResultScanner rs = hTable.getScanner(scan);
     Result next = rs.next();
     int readCount = 0;
     while (next != null  readCount  40) {
       System.out.println(Row key:  + Bytes.toString(next.getRow()));
       System.out.println(count:  +
  Bytes.toInt(next.getValue(Bytes.toBytes(agg),
 Bytes.toBytes(count;
       next = rs.next();
       readCount++;
     }
 
  // CODE ENDS
 
  If I comment the line
     filters.addFilter(filter);
  then the code works on remote client box as well.
 
 
  Client fails with the exception I provided previously. The bigger master
 log
  (this is the log till the end and started when I ran the client code):
 
  2010-03-19 12:15:02,098 INFO
 org.apache.hadoop.hbase.master.ServerManager: 1
  region servers, 0 dead, average load 27.0
  2010-03-19 12:15:41,369 INFO org.apache.hadoop.hbase.master.BaseScanner:
  RegionManager.metaScanner scanning meta region {server:
 10.210.71.80:39207,
  regionname: .META.,,1, startKey: }
  2010-03-19 12:15:41,398 INFO org.apache.hadoop.hbase.master.BaseScanner:
  RegionManager.metaScanner scan of 25 row(s) of meta region {server:
  10.210.71.80:39207, regionname: .META.,,1, startKey: } complete
  2010-03-19 12:15:41,398 INFO org.apache.hadoop.hbase.master.BaseScanner:
 All
  1 .META. region(s) scanned
  2010-03-19 12:15:41,548 INFO org.apache.hadoop.hbase.master.BaseScanner:
  RegionManager.rootScanner scanning meta region {server:
 10.210.71.80:39207,
  regionname: -ROOT-,,0, startKey: }
  2010-03-19 12:15:41,549 INFO org.apache.hadoop.hbase.master.BaseScanner:
  RegionManager.rootScanner scan of 1 row(s) of meta region {server:
  10.210.71.80:39207, regionname: -ROOT-,,0, startKey: } complete
  2010-03-19 12:15:45,398 DEBUG
  org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
  Total=43.474342MB (45586152), Free=156.21317MB (163801368),
 Max=199.6875MB
  (209387520), Counts: Blocks=684, Access=90114, Hit=80498, Miss=9616,
  Evictions=0, Evicted=0, Ratios: Hit Ratio=89.32906985282898%, Miss
  Ratio=10.670927911996841%, Evicted/Run=NaN
  2010-03-19 12:16:02,108 INFO
 org.apache.hadoop.hbase.master.ServerManager: 1
  region servers, 0 dead, average load 27.0
  2010-03-19 12:16:41,378 INFO org.apache.hadoop.hbase.master.BaseScanner:
  RegionManager.metaScanner scanning meta region {server:
 10.210.71.80:39207

Re: Adding filter to scan at remote client causes UnknownScannerException

2010-03-19 Thread Jean-Daniel Cryans
        0 2010-02-04 14:37
 hbase-ubuntu-zookeeper-domU-12-31-39-09-40-A2.out.3
 -rw-r--r-- 1 ubuntu ubuntu        0 2010-02-04 08:15
 hbase-ubuntu-zookeeper-domU-12-31-39-09-40-A2.out.4
 -rw-r--r-- 1 ubuntu ubuntu        0 2010-02-04 07:56
 hbase-ubuntu-zookeeper-domU-12-31-39-09-40-A2.out.5

 Is there anything else I can provide you?

 The command I'm using to run the client is the following (you can see that
 0.20.3 version of HBase is used and also other versions you might be
 interested in):
 java -cp
 commons-cli-1.2.jar:commons-logging-1.1.1.jar:hadoop-0.20.1-core.jar:hbase-0.20.3.jar:hbase-0.20.3-test.jar:log4j-1.2.15.jar:test-1.0-SNAPSHOT.jar:zookeeper-3.2.2.jar
 com.foo.bar.client.ClientExample

 Thank you for your help!
 Alex

 On Thu, Mar 18, 2010 at 7:13 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 Alex,

 Is there anything else in the region server logs before that like
 lease expirations? Can we see a much bigger log? Also is there
 anything in the .out file? Can you post a snippet of the code you are
 using?

 Thx

 J-D

 On Wed, Mar 17, 2010 at 11:25 PM, Alex Baranov alex.barano...@gmail.com
 wrote:
  To give more clarity, I'm using *not custom* filter, but standard
  SingleColumnValueFilter. So it's not related to classpath issues.
 
  Any help is very appreciated!
 
  Thanks,
  Alex.
 
  On Wed, Mar 17, 2010 at 6:00 PM, Alex Baranov alex.barano...@gmail.com
 wrote:
 
  Hello guys,
 
  I've got a problem while adding a filter to scanner in a client app
 which
  runs on the remote (not the one from HBase cluster) box. The same code
 works
  well and scan result is fetched very quickly if I run the client on the
 same
  box where HBase master resides. If I comment out adding filter then the
  scanner returns results. But with filter it keeps showing me the error
  below.
 
  I'm using HBase 0.20.3 on both ends.
 
  On the mailing list I saw that problems like this can arise when using
  different versions of HBase on server and on client, but this is not the
  case. Also the error like this can show up when it takes a lot of time
 to
  initialize scanner (lease time by default is 1 min), but I assume this
 is
  also not the case since without adding filter I got results very
 quickly.
 
  Does anyone have an idea what is going on?
 
  - in log on remote client side:
 
  Exception in thread main
  org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
 contact
  region server 10.210.71.80:39207 for region x,,1267450079067, row
  'y', but failed after 10 attempts.
  Exceptions:
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
  java.io.IOException: Call to /10.210.71.80:39207 failed on local
  exception: java.io.EOFException
 
      at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002)
      at
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1931)
      at
 
 org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1851)
      at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:372)
      at
 
 com.sematext.sa.client.AggregatesAccessor.getResult(AggregatesAccessor.java:74)
      at com.sematext.sa.client.ClientExample.main(ClientExample.java:41)
 
 
  - in HBase master log:
 
  2010-03-17 12:37:45,068 ERROR
  org.apache.hadoop.hbase.regionserver.HRegionServer:
  org.apache.hadoop.hbase.UnknownScannerException: Name: -1
          at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1877)
          at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
          at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at
  org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
          at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 
  - in HBase region server log/out: nothing
 
  Thank you in advance.
  Alex.
 
 




Re: Adding filter to scan at remote client causes UnknownScannerException

2010-03-18 Thread Jean-Daniel Cryans
Alex,

Is there anything else in the region server logs before that like
lease expirations? Can we see a much bigger log? Also is there
anything in the .out file? Can you post a snippet of the code you are
using?

Thx

J-D

On Wed, Mar 17, 2010 at 11:25 PM, Alex Baranov alex.barano...@gmail.com wrote:
 To give more clarity, I'm using *not custom* filter, but standard
 SingleColumnValueFilter. So it's not related to classpath issues.

 Any help is very appreciated!

 Thanks,
 Alex.

 On Wed, Mar 17, 2010 at 6:00 PM, Alex Baranov alex.barano...@gmail.comwrote:

 Hello guys,

 I've got a problem while adding a filter to scanner in a client app which
 runs on the remote (not the one from HBase cluster) box. The same code works
 well and scan result is fetched very quickly if I run the client on the same
 box where HBase master resides. If I comment out adding filter then the
 scanner returns results. But with filter it keeps showing me the error
 below.

 I'm using HBase 0.20.3 on both ends.

 On the mailing list I saw that problems like this can arise when using
 different versions of HBase on server and on client, but this is not the
 case. Also the error like this can show up when it takes a lot of time to
 initialize scanner (lease time by default is 1 min), but I assume this is
 also not the case since without adding filter I got results very quickly.

 Does anyone have an idea what is going on?

 - in log on remote client side:

 Exception in thread main
 org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
 region server 10.210.71.80:39207 for region x,,1267450079067, row
 'y', but failed after 10 attempts.
 Exceptions:
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException
 java.io.IOException: Call to /10.210.71.80:39207 failed on local
 exception: java.io.EOFException

     at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002)
     at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1931)
     at
 org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1851)
     at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:372)
     at
 com.sematext.sa.client.AggregatesAccessor.getResult(AggregatesAccessor.java:74)
     at com.sematext.sa.client.ClientExample.main(ClientExample.java:41)


 - in HBase master log:

 2010-03-17 12:37:45,068 ERROR
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 org.apache.hadoop.hbase.UnknownScannerException: Name: -1
         at
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1877)
         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
         at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
         at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 - in HBase region server log/out: nothing

 Thank you in advance.
 Alex.




Re: New Datanode won't start, null pointer exception

2010-03-18 Thread Jean-Daniel Cryans
Google is your friend ;)

https://issues.apache.org/jira/browse/HADOOP-5687

J-D

On Thu, Mar 18, 2010 at 1:29 PM, Scott skes...@weather.com wrote:
 We have a working 10 node cluster and are trying to add an 11th box (insert
 Spinal Tap joke here).  The box (Centos Linux) was built in an identical
 manner to the other 10 and has the same version of hadoop (0.20.2).  The
 configs are the exact same as the other nodes.  However when trying to start
 the hadoop daemons it throws a NPE.  Here is all that is written to the
 logs.  Any idea whats causing this?

 /
 2010-03-18 16:09:42,993 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = hadoop0b10/192.168.60.100
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.2
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
 911707; compiled by 'chrisdo' on Fri Feb
 19 08:07:34 UTC 2010
 /
 2010-03-18 16:09:43,058 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 java.lang.NullPointerException
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
   at
 org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156)
   at
 org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:246)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:216)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)

 2010-03-18 16:09:43,059 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at hadoop0b10/192.168.60.100
 /




Re: Weird HBase Shell issue with count

2010-03-17 Thread Jean-Daniel Cryans
Zookeeper doesn't need _that_ much ;)

You say you are loosing your zk server... can we see the error? Pastebin?

Thx

J-D

On Tue, Mar 16, 2010 at 11:48 PM, Michael Segel
michael_se...@hotmail.com wrote:

 Unfortunately I can't up the ulimit easily. :-( I'll have to get an admin to 
 do that.

 I did update the xceivers and set it to 2048 based on something I saw.
 But I'm losing my zookeeper on the node. Getting an IO error.
 I had the handler count high at 50 but reset it back down to 25 (default 
 value)

 From what I've read, I definitely will move the zookeeper nodes when I can 
 find additional machines to add to the cluster.

 Again any input welcome.

 Thx

 -Mike




 Date: Tue, 16 Mar 2010 20:30:27 -0800
 Subject: Re: Weird HBase Shell issue with count
 From: st...@duboce.net
 To: hbase-user@hadoop.apache.org

 Oh, you've read the 'getting started' and the hbase requirements where
 it specifies upping ulimit and xceivers in your cluster?
 St.Ack

 On Tue, Mar 16, 2010 at 8:29 PM, Stack st...@duboce.net wrote:
  Is DEBUG enabled in the log4j.properties that the client can see?  If
  not, enable it.  If so, can you see the regions loading as the count
  progresses?  Which region does it stop at?  Can you try to do a get on
  its startkey?  Does it work?
 
  St.Ack
 
  On Tue, Mar 16, 2010 at 8:25 PM, Michael Segel
  michael_se...@hotmail.com wrote:
 
  Ok,
 
  Still trying to track down some issues.
 
  I opened up an hbase shell and decided to use count  to count the number 
  of rows in a table.
 
  As it was running, count was flying along until it hit 150,000 then 
  stopped.
  Just stood there, nothing.
 
  I started to check the other nodes in the cloud to see what is happening 
  and the load on the data nodes, which are also region servers jumped up, 
  where one 1 node jumped up to 2.71 ... other nodes saw some jump but 
  again it doesn't make sense why the count suddenly died.
 
  I'm going to check the logs, but has anyone seen something like this?
 
  Thx
 
  -Mike
 
 
  _
  Hotmail has tools for the New Busy. Search, chat and e-mail from your 
  inbox.
  http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_1
 

 _
 The New Busy is not the old busy. Search, chat and e-mail from your inbox.
 http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_3


Re: Analysing slow HBase mapreduce performance

2010-03-16 Thread Jean-Daniel Cryans
Did you set scanner caching higher?

J-D

On Tue, Mar 16, 2010 at 9:10 PM, Dmitry dmi...@tellapart.com wrote:
 Hi all,

 I'm trying to analyse some issues with HBase performance in a mapreduce.

 I'm running a mapreduce which reads a table and just writes it out to HDFS.
 The table is small, roughly ~400M of data and 18M rows.
 I've pre-split the table into 32 regions, so that I'm not running into the
 problem of having one region server serve the entire table.

 I'm running an HBase cluster with:
 - 16 region servers (each on the same machine as a Hadoop tasktracker and
 datanode).
 - 1 master (on the same machine as the Hadoop jobtracker and namenode.)
 - Zookeeper quorum of just 1 machine (on the same machine as the master).

 I have LZO compression enabled for both HBase and Hadoop.

 Running this job takes about 5-6 minutes.

 Running a mapreduce reading the exact same set of data from a SequenceFile
 on HDFS takes only about 1 minute.

 What else can I do to try to diagnose this?

 Thanks,

 - Dmitry



Re: Analysing slow HBase mapreduce performance

2010-03-16 Thread Jean-Daniel Cryans
Out of interest... to what did you set it and what was the speed-up like?

J-D

On Tue, Mar 16, 2010 at 9:26 PM, Dmitry Chechik dmi...@tellapart.com wrote:
 That did it. Thanks!

 On Tue, Mar 16, 2010 at 9:14 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 Did you set scanner caching higher?

 J-D

 On Tue, Mar 16, 2010 at 9:10 PM, Dmitry dmi...@tellapart.com wrote:
  Hi all,
 
  I'm trying to analyse some issues with HBase performance in a mapreduce.
 
  I'm running a mapreduce which reads a table and just writes it out to
 HDFS.
  The table is small, roughly ~400M of data and 18M rows.
  I've pre-split the table into 32 regions, so that I'm not running into
 the
  problem of having one region server serve the entire table.
 
  I'm running an HBase cluster with:
  - 16 region servers (each on the same machine as a Hadoop tasktracker and
  datanode).
  - 1 master (on the same machine as the Hadoop jobtracker and namenode.)
  - Zookeeper quorum of just 1 machine (on the same machine as the master).
 
  I have LZO compression enabled for both HBase and Hadoop.
 
  Running this job takes about 5-6 minutes.
 
  Running a mapreduce reading the exact same set of data from a
 SequenceFile
  on HDFS takes only about 1 minute.
 
  What else can I do to try to diagnose this?
 
  Thanks,
 
  - Dmitry
 




Re: NoSuchColumnFamilyException

2010-03-12 Thread Jean-Daniel Cryans
Ted,

You aren't the first one to report that issue (I saw 2 other ppl
reporting it since 2 weeks ago), looks like a real bug. Can you grep
around your hbase logs for ruletable,,1268431015006 and see if there's
any exception related to that region? Can you identify exactly when it
happened and what was happening?

Thx

J-D

On Fri, Mar 12, 2010 at 2:33 PM, Ted Yu yuzhih...@gmail.com wrote:
 Hi,
 When I tried to insert into ruletable, I saw:

 hbase(main):003:0 put 'ruletable', 'com.yahoo.www', 'lpm_1.0:category',
 '1123:1'                                     NativeException:
 org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:
 org.apache.hadoop.hba
 se.regionserver.NoSuchColumnFamilyException: Column family lpm_1.0 does not
 exist in region ruletable,,1              268431015006 in table {NAME =
 'ruletable', FAMILIES = []}
        at
 org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2375)
        at
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1241)
        at
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 However:
 hbase(main):002:0 describe 'ruletable'
 DESCRIPTION
  {NAME = 'ruletable', FAMILIES = [{NAME = 'exactmatch_1.0', VERSIONS  =
 '3', COMPRESSION = 'LZO', TTL = '1209600', TTU = '1123300', BLOCKSIZE =
 '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'},
  {NAME = 'lpm_1.0', COMPRESSION = 'LZO', VERSIONS = '3', TTL =
 '15552000', TTU = '14688000', BLOCKSIZE = '65536', IN_MEMORY = 'false',
 BLOCKCACHE = 'true'}]}

 Can someone explain the above scenario ?

 Thanks



Re: Table left unresponsive after Thrift socket timeout

2010-03-11 Thread Jean-Daniel Cryans
Joe,

We'll need to learn what happened to that region, they usually don't
throw up after a few inserts ;)

So in that region server's log, before you tried disabling that table,
do you see anything wrong (exceptions probably)? If you have a web
server, it would be nice to drop the full RS log and the master log.

thx!

J-D

On Wed, Mar 10, 2010 at 5:54 PM, Joe Pepersack j...@pepersack.net wrote:
 On 03/10/2010 07:58 PM, Jean-Daniel Cryans wrote:

 Which HBase version? What's your hardware like? How much data were you
 inserting? Did you grep the region server logs for any IOException or
 such? Can we see an excerpt of those logs around the time of the lock
 up?


 Version: 0.20.3-1.cloudera
 Hardware: dual Xeon 4 core, 16G, 1.7T disk
 10x nodes: 1 master, 1 secondary master, 8x regionservers.  2x zookeepers
 running on regionservers


 It appears to have died after only a few rows were inserted.   There's only
 one region shown on the status page.  Curiously, that region does NOT show
 up in the list of online regions for the listed regionserver.

 Master log, from the point where I ran drop 'Person' in the shell:

 010-03-10 20:44:44,812 INFO org.apache.hadoop.hbase.master.BaseScanner:
 RegionManager.rootScanner scanning meta region {server: 10.40.0.37:60020,
 regionname: -ROOT-,,0, startKey:}
 2010-03-10 20:44:44,815 INFO org.apache.hadoop.hbase.master.BaseScanner:
 RegionManager.rootScanner scan of 1 row(s) of meta region {server:
 10.40.0.37:60020, regionname: -ROOT-,,0, startKey:} complete
 2010-03-10 20:44:44,836 INFO org.apache.hadoop.hbase.master.BaseScanner:
 RegionManager.metaScanner scanning meta region {server: 10.40.0.36:60020,
 regionname: .META.,,1, startKey:}
 2010-03-10 20:44:44,844 INFO org.apache.hadoop.hbase.master.BaseScanner:
 RegionManager.metaScanner scan of 3 row(s) of meta region {server:
 10.40.0.36:60020, regionname: .META.,,1, startKey:} complete
 2010-03-10 20:44:44,844 INFO org.apache.hadoop.hbase.master.BaseScanner: All
 1 .META. region(s) scanned
 2010-03-10 20:44:45,357 INFO org.apache.hadoop.hbase.master.ServerManager: 5
 region servers, 0 dead, average load 1.2
 2010-03-10 20:45:03,209 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions
 2010-03-10 20:45:03,209 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing regions
 currently being served
 2010-03-10 20:45:03,210 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Adding region
 Person,,1268251509658 to setClosing list
 2010-03-10 20:45:04,260 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions
 2010-03-10 20:45:04,260 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing regions
 currently being served
 2010-03-10 20:45:04,260 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Adding region
 Person,,1268251509658 to setClosing list
 2010-03-10 20:45:05,273 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions
 2010-03-10 20:45:05,273 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing regions
 currently being served
 2010-03-10 20:45:05,273 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Adding region
 Person,,1268251509658 to setClosing list
 2010-03-10 20:45:06,287 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions
 2010-03-10 20:45:06,287 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing regions
 currently being served
 2010-03-10 20:45:06,287 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Adding region
 Person,,1268251509658 to setClosing list
 2010-03-10 20:45:08,301 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions
 2010-03-10 20:45:08,301 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Processing regions
 currently being served
 2010-03-10 20:45:08,301 DEBUG
 org.apache.hadoop.hbase.master.ChangeTableState: Adding region
 Person,,1268251509658 to setClosing list


 Log from the region server where the region is supposed to be for the same
 time frame:

 2010-03-10 20:43:50,889 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
 Total=1.6213074MB (1700064), Free=195.8787MB (205393696), Max=197.5MB
 (207093760), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0,
 Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted/Run=NaN
 2010-03-10 20:44:50,889 DEBUG
 org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
 Total=1.6213074MB (1700064), Free=195.8787MB (205393696), Max=197.5MB
 (207093760), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0,
 Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted/Run=NaN
 2010-03-10 20:45:04,058 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
 Person,,1268251509658
 2010-03-10 20:45:04,059 INFO
 org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
 MSG_REGION_CLOSE: Person,,1268251509658
 2010-03-10 20:45:05,062 INFO

  1   2   3   4   5   6   7   >