Re: Please welcome our newest committer and PMC member, Liyin Tang

2012-05-16 Thread Gaojinchao
Welcome! :)

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2012年5月17日 12:04
收件人: HBase Dev List
主题: Please welcome our newest committer and PMC member, Liyin Tang

One of us Liyin!

Keep up the great work.

Add yourself to the pom.xml team section (Don't break the build!).

St.Ack


Re: HBase coprocessors blog posted

2012-02-01 Thread Gaojinchao
Great job! 
We will use the feature!

-邮件原件-
发件人: Mingjie Lai [mailto:m...@apache.org] 
发送时间: 2012年2月1日 16:26
收件人: u...@hbase.apache.org; dev@hbase.apache.org
主题: HBase coprocessors blog posted

Hi hbasers.

A hbase blog regarding coprocessors has been posted to apache blog site. 
Here is the link:

https://blogs.apache.org/hbase/entry/coprocessor_introduction

Your comments are welcome.

Thanks,
Mingjie


Re: hbase 0.94.0

2012-01-26 Thread Gaojinchao
It is good news, but it is difficult to choose version. :)

Our current version is 0.90. we will choose our next version and start a 
large-scale test.

Which version will be HBase 1.0?

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2012年1月26日 12:35
收件人: HBase Dev List
主题: hbase 0.94.0

Lets branch end of february?  No new features thereafter.  Is this too
close in?  Would be grand if 0.94.0 shipped before hbasecon.  What
should 0.94.0 have in it?  I don't mind if the list is short.

Unless someone else wants too, I don't mind being release manager
(will try to run a tighter ship this time around).

St.Ack


Re: Why writes a meta HRegionInfo for blockcache/InMemory is off to hdfs system?

2012-01-05 Thread Gaojinchao
Thanks for your reply.

I checked the heath of the cluster and found this issue. I went through the 
code and 
have not found any place using this information. I suspect this code is not 
useful.
I wanted to know why do this , checking the SVN couldn't find any useful 
information(
Maybe this code was merged a long time ago). 


-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2012年1月6日 13:07
收件人: dev@hbase.apache.org
主题: Re: Why writes a meta HRegionInfo for blockcache/InMemory is off to hdfs 
system?

On Thu, Jan 5, 2012 at 3:01 AM, Gaojinchao  wrote:
> I found “IN_MEMORY => 'false', BLOCKCACHE => 'false'”  in hdfs system.
>
>       .META.,,1 .META.[1]IS_ROOTfalseIS_META

What you thinking Jinchao?  We should set IN_MEMORY true?  What else?
St.Ack


Why writes a meta HRegionInfo for blockcache/InMemory is off to hdfs system?

2012-01-05 Thread Gaojinchao
Below code is doing this when we start a cluster.

Code:
private static void bootstrap(final Path rd, final Configuration c)
  throws IOException {
LOG.info("BOOTSTRAP: creating ROOT and first META regions");
try {
  // Bootstrapping, make sure blockcache is off.  Else, one will be
  // created here in bootstap and it'll need to be cleaned up.  Better to
  // not make it in first place.  Turn off block caching for bootstrap.
  // Enable after.
  HRegionInfo rootHRI = new HRegionInfo(HRegionInfo.ROOT_REGIONINFO);
  setInfoFamilyCaching(rootHRI, false);
  HRegionInfo metaHRI = new HRegionInfo(HRegionInfo.FIRST_META_REGIONINFO);
  setInfoFamilyCaching(metaHRI, false);
  HRegion root = HRegion.createHRegion(rootHRI, rd, c);
  HRegion meta = HRegion.createHRegion(metaHRI, rd, c);
  setInfoFamilyCaching(rootHRI, true);
  setInfoFamilyCaching(metaHRI, true);

I found “IN_MEMORY => 'false', BLOCKCACHE => 'false'”  in hdfs system.

   .META.,,1.META.[1]IS_ROOTfalseIS_META

true

info
BLOOMFILTER

NONEREPLICATION_SCOPE0
COMPRESSION

NONEVERSIONS[1]10

TTL
2147483647 BLOCKSIZE

8192 IN_MEMORYfalse
BLOCKCACHEfalseK��

REGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 
1028785192, TABLE => {{NAME => '.META.', IS_META => 'true', FAMILIES => [{NAME 
=> 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
'NONE', VERSIONS => '10', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY 
=> 'false', BLOCKCACHE => 'false'}]}}


HBaseclient is blocked forever

2011-12-16 Thread Gaojinchao
Since the client had a temporary network failure, After it recovered.
I found my client thread was blocked.
Looks below stack and logs, It said that we use a invalid CatalogTracker in 
function "tableExists".

Block stack:
"WriteHbaseThread33" prio=10 tid=0x7f76bc27a800 nid=0x2540 in Object.wait() 
[0x7f76af4f3000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
 - locked <0x7f7a67817c98> (a 
java.util.concurrent.atomic.AtomicBoolean)
 at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
 at 
org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
 at 
org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
 at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
Source)
 at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
 - locked <0x7f7a4c5dc578> (a com.huawei.hdi.hbase.HbaseReOper)
 at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
 at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)

In ZooKeeperNodeTracker, We don't throw the KeeperException to high level.
So in CatalogTracker level, We think ZooKeeperNodeTracker start success and
continue to process .

[WriteHbaseThread33]2011-12-16 17:07:33,153[WARN ]  | 
hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Unable to get 
data of znode /hbase/root-region-server | 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:557)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/root-region-server
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
 at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
 at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
 at 
org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
 at 
org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
 at 
org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
 at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
Source)
 at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
 at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
 at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
[WriteHbaseThread33]2011-12-16 17:07:33,361[ERROR]  | 
hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Received 
unexpected KeeperException, re-throwing exception | 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.keeperException(ZooKeeperWatcher.java:385)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/root-region-server
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
 at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
 at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
 at 
org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
 at 
org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
 at 
org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
 at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
Source)
 at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
 at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
 at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)


[WriteHbaseThread33]2011-12-16 17:07:33,361[FATAL]  | Unexpected exception 
during initialization, aborting | 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1351)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/root-region-server
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
 at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
 at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNo

re: The clusters can't provide services because Region can't flush.

2011-12-11 Thread Gaojinchao
I have made a issue https://issues.apache.org/jira/browse/HBASE-5008 . A patch 
is verifying.
As Lars said in issue:
1. a requested flush was canceled (because we had closed the region in the 
process of splitting), we never unset flushRequested.
2. splitting Region failed and we reopened the parent region.
3. from this point on every new flush request is ignored because flushRequested 
is already true

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年12月12日 6:33
收件人: dev@hbase.apache.org
主题: Re: The clusters can't provide services because Region can't flush.

On Sun, Dec 11, 2011 at 4:34 AM, Gaojinchao  wrote:
> Hbase version 0.90.4 + patches
>
> My analysis is as follows:
>


I'm not sure I follow Jinchao.  All handlers are blocked because mem
is full and we got into a situation where we can't flush brought on by
split?  How would you fix it?

Thanks,
St.Ack


The clusters can't provide services because Region can't flush.

2011-12-11 Thread Gaojinchao
Hbase version 0.90.4 + patches

My analysis is as follows:

//Started splitting region b24d8ccb852ff742f2a27d01b7f5853e and closed region.

2011-12-10 17:32:48,653 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.
2011-12-10 17:32:49,759 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Closing Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: 
disabling compactions & flushes
2011-12-10 17:32:49,759 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Running close preflush of 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.

//Processed a flush request and skipped , But flushRequested had set to true
2011-12-10 17:33:06,963 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memstore flush for 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e., current 
region memstore size 12.6m
2011-12-10 17:33:17,277 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Skipping flush on 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e. because 
closing

//split region b24d8ccb852ff742f2a27d01b7f5853 failed and rolled back, 
flushRequested flag was true, So all handle was blocked

2011-12-10 17:34:01,293 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Cleaned up old failed 
split transaction detritus: 
hdfs://193.195.18.121:9000/hbase/Htable_UFDR_004/b24d8ccb852ff742f2a27d01b7f5853e/splits
2011-12-10 17:34:01,294 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Onlined Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.; 
next sequenceid=15494173
2011-12-10 17:34:01,295 INFO 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback of 
failed split of 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.
2011-12-10 17:43:10,147 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 19 on 20020' on region
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size


// All handles had been blocked. The clusters could not provide services

2011-12-10 17:34:01,295 INFO 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback of 
failed split of 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.
2011-12-10 17:43:10,147 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 19 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:10,192 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 34 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:10,193 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 51 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:10,196 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 85 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:10,199 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 88 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:10,202 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 44 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:11,663 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 2 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:11,665 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 10 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:11,670 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 75 on 20020' on region 
Htable_UFDR_004,09781,1323508582833.b24d8ccb852ff742f2a27d01b7f5853e.: memstore 
size 384.0m is >= than blocking 384.0m size
2011-12-10 17:43:11,671 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Blocking updates for 'IPC Server handler 98 on 20020' on region 
Htable_UFDR_004,09781,1323508582833

Re: FeedbackRe: Suspected memory leak

2011-12-04 Thread Gaojinchao
Ok. Anyone has better solution?. Do we need to introduce in book?


-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年12月5日 11:39
收件人: dev@hbase.apache.org
主题: Re: FeedbackRe: Suspected memory leak

Jinchao:
Since we found the workaround, can you summarize the following statistics
on HBASE-4633 ?

Thanks

2011/12/4 Gaojinchao 

> Yes, I have tested, System is fine.
> Nearly one hours , trigger a full GC.
> 10022.210: [Full GC (System) 10022.210: [Tenured:
> 577566K->257349K(1048576K), 1.7515610 secs] 9651924K->257349K(14260672K),
> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75
> sys=0.00, real=1.75 secs]
> .
>
> .
> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K),
> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times:
> user=1.90 sys=0.01, real=0.14 secs]
> 13624.630: [Full GC (System) 13624.630: [Tenured:
> 310202K->175378K(1048576K), 1.9529280 secs] 11581276K->175378K(14260672K),
> [Perm : 19225K->19225K(65536K)], 1.9531660 secs]
>   [Times: user=1.94 sys=0.00, real=1.96 secs]
>
> 7543 root  20   0 17.0g  15g 9892 S0 32.9   1184:34 java
> 7543 root  20   0 17.0g  15g 9892 S1 32.9   1184:34 java
>
> -邮件原件-
> 发件人: Ted Yu [mailto:yuzhih...@gmail.com]
> 发送时间: 2011年12月5日 9:06
> 收件人: dev@hbase.apache.org
> 主题: Re: FeedbackRe: Suspected memory leak
>
> Can you try specifying XX:MaxDirectMemorySize with moderate value and see
> if the leak gets under control ?
>
> Thanks
>
> 2011/12/4 Gaojinchao 
>
> > I have attached the stack in
> > https://issues.apache.org/jira/browse/HBASE-4633.
> > I will update our story.
> >
> >
> > -邮件原件-
> > 发件人: Ted Yu [mailto:yuzhih...@gmail.com]
> > 发送时间: 2011年12月5日 7:37
> > 收件人: dev@hbase.apache.org; lars hofhansl
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > I looked through TRUNK and 0.90 code but didn't find
> > HBaseClient.Connection.setParam().
> > The method should be sendParam().
> >
> > When I was in China I tried to access Jonathan's post but wasn't able to.
> >
> > If Jinchao's stack trace resonates with the one Jonathan posted, we
> should
> > consider using netty for HBaseClient.
> >
> > Cheers
> >
> > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl 
> wrote:
> >
> > > I think HBASE-4508 is unrelated.
> > > The "connections" I referring to are HBaseClient.Connection objects
> (not
> > > HConnections).
> > > It turns out that HBaseClient.Connection.setParam is actually called
> > > directly by the client threads, which means we can get
> > > an unlimited amount of DirectByteBuffers (until we get a full GC).
> > >
> > > The JDK will cache 3 per thread with a size necessary to serve the IO.
> So
> > > sending some large requests from many thread
> > > will lead to OOM.
> > >
> > > I think that was a related thread that Stack forwarded a while back
> from
> > > the asynchbase mailing lists.
> > >
> > > Jinchao, could you add a text version (not a png image, please :-) ) of
> > > this to the jira?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > - Original Message -
> > > From: Ted Yu 
> > > To: dev@hbase.apache.org; lars hofhansl 
> > > Cc: Gaojinchao ; Chenjian <
> > jean.chenj...@huawei.com>;
> > > wenzaohua 
> > > Sent: Sunday, December 4, 2011 12:43 PM
> > > Subject: Re: FeedbackRe: Suspected memory leak
> > >
> > > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution
> because
> > > 0.90.5 hasn't been released.
> > > Assuming the NIO consumption is related to the number of connections
> from
> > > client side, it would help to perform benchmarking on 0.90.5
> > >
> > > Jinchao:
> > > Please attach stack trace to HBASE-4633 so that we can verify our
> > > assumptions.
> > >
> > > Thanks
> > >
> > > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl 
> > > wrote:
> > >
> > > > Thanks. Now the question is: How many connection threads do we have?
> > > >
> > > > I think there is one per regionserver, which would indeed be a
> problem.
> > > > Need to look at the code again (I'm only partially familiar with the
> > > > client code).
> > > >
> > > > Either the client should chunk (like the server does), o

Re: FeedbackRe: Suspected memory leak

2011-12-04 Thread Gaojinchao
Yes, I have tested, System is fine. 
Nearly one hours , trigger a full GC. 
10022.210: [Full GC (System) 10022.210: [Tenured: 577566K->257349K(1048576K), 
1.7515610 secs] 9651924K->257349K(14260672K), [Perm : 19161K->19161K(65536K)], 
1.7518350 secs] [Times: user=1.75 sys=0.00, real=1.75 secs]
.

.
13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K), 0.1414370 
secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times: user=1.90 
sys=0.01, real=0.14 secs]
13624.630: [Full GC (System) 13624.630: [Tenured: 310202K->175378K(1048576K), 
1.9529280 secs] 11581276K->175378K(14260672K), [Perm : 19225K->19225K(65536K)], 
1.9531660 secs] 
   [Times: user=1.94 sys=0.00, real=1.96 secs]

7543 root  20   0 17.0g  15g 9892 S0 32.9   1184:34 java
7543 root  20   0 17.0g  15g 9892 S1 32.9   1184:34 java

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年12月5日 9:06
收件人: dev@hbase.apache.org
主题: Re: FeedbackRe: Suspected memory leak

Can you try specifying XX:MaxDirectMemorySize with moderate value and see
if the leak gets under control ?

Thanks

2011/12/4 Gaojinchao 

> I have attached the stack in
> https://issues.apache.org/jira/browse/HBASE-4633.
> I will update our story.
>
>
> -邮件原件-
> 发件人: Ted Yu [mailto:yuzhih...@gmail.com]
> 发送时间: 2011年12月5日 7:37
> 收件人: dev@hbase.apache.org; lars hofhansl
> 主题: Re: FeedbackRe: Suspected memory leak
>
> I looked through TRUNK and 0.90 code but didn't find
> HBaseClient.Connection.setParam().
> The method should be sendParam().
>
> When I was in China I tried to access Jonathan's post but wasn't able to.
>
> If Jinchao's stack trace resonates with the one Jonathan posted, we should
> consider using netty for HBaseClient.
>
> Cheers
>
> On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl  wrote:
>
> > I think HBASE-4508 is unrelated.
> > The "connections" I referring to are HBaseClient.Connection objects (not
> > HConnections).
> > It turns out that HBaseClient.Connection.setParam is actually called
> > directly by the client threads, which means we can get
> > an unlimited amount of DirectByteBuffers (until we get a full GC).
> >
> > The JDK will cache 3 per thread with a size necessary to serve the IO. So
> > sending some large requests from many thread
> > will lead to OOM.
> >
> > I think that was a related thread that Stack forwarded a while back from
> > the asynchbase mailing lists.
> >
> > Jinchao, could you add a text version (not a png image, please :-) ) of
> > this to the jira?
> >
> >
> > -- Lars
> >
> >
> >
> > - Original Message -
> > From: Ted Yu 
> > To: dev@hbase.apache.org; lars hofhansl 
> > Cc: Gaojinchao ; Chenjian <
> jean.chenj...@huawei.com>;
> > wenzaohua 
> > Sent: Sunday, December 4, 2011 12:43 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> > 0.90.5 hasn't been released.
> > Assuming the NIO consumption is related to the number of connections from
> > client side, it would help to perform benchmarking on 0.90.5
> >
> > Jinchao:
> > Please attach stack trace to HBASE-4633 so that we can verify our
> > assumptions.
> >
> > Thanks
> >
> > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl 
> > wrote:
> >
> > > Thanks. Now the question is: How many connection threads do we have?
> > >
> > > I think there is one per regionserver, which would indeed be a problem.
> > > Need to look at the code again (I'm only partially familiar with the
> > > client code).
> > >
> > > Either the client should chunk (like the server does), or there should
> be
> > > a limited number of thread that
> > > perform IO on behalf of the client (or both).
> > >
> > > -- Lars
> > >
> > >
> > > - Original Message -
> > > From: Gaojinchao 
> > > To: "dev@hbase.apache.org" ; lars hofhansl <
> > > lhofha...@yahoo.com>
> > > Cc: Chenjian ; wenzaohua <
> wenzao...@huawei.com
> > >
> > > Sent: Saturday, December 3, 2011 11:22 PM
> > > Subject: Re: FeedbackRe: Suspected memory leak
> > >
> > > This is dump stack.
> > >
> > >
> > > -邮件原件-
> > > 发件人: lars hofhansl [mailto:lhofha...@yahoo.com]
> > > 发送时间: 2011年12月4日 14:15
> > > 收件人: dev@hbase.apache.org
> > > 抄送: Chenjian; wenzaohua
> > > 主题: Re: FeedbackRe: Su

Re: FeedbackRe: Suspected memory leak

2011-12-04 Thread Gaojinchao
Some information has updated in HBASE-4633.


-邮件原件-
发件人: Gaojinchao [mailto:gaojinc...@huawei.com] 
发送时间: 2011年12月5日 8:45
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: FeedbackRe: Suspected memory leak

I have attached the stack in https://issues.apache.org/jira/browse/HBASE-4633.
I will update our story.


-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年12月5日 7:37
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: FeedbackRe: Suspected memory leak

I looked through TRUNK and 0.90 code but didn't find
HBaseClient.Connection.setParam().
The method should be sendParam().

When I was in China I tried to access Jonathan's post but wasn't able to.

If Jinchao's stack trace resonates with the one Jonathan posted, we should
consider using netty for HBaseClient.

Cheers

On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl  wrote:

> I think HBASE-4508 is unrelated.
> The "connections" I referring to are HBaseClient.Connection objects (not
> HConnections).
> It turns out that HBaseClient.Connection.setParam is actually called
> directly by the client threads, which means we can get
> an unlimited amount of DirectByteBuffers (until we get a full GC).
>
> The JDK will cache 3 per thread with a size necessary to serve the IO. So
> sending some large requests from many thread
> will lead to OOM.
>
> I think that was a related thread that Stack forwarded a while back from
> the asynchbase mailing lists.
>
> Jinchao, could you add a text version (not a png image, please :-) ) of
> this to the jira?
>
>
> -- Lars
>
>
>
> - Original Message -
> From: Ted Yu 
> To: dev@hbase.apache.org; lars hofhansl 
> Cc: Gaojinchao ; Chenjian ;
> wenzaohua 
> Sent: Sunday, December 4, 2011 12:43 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> 0.90.5 hasn't been released.
> Assuming the NIO consumption is related to the number of connections from
> client side, it would help to perform benchmarking on 0.90.5
>
> Jinchao:
> Please attach stack trace to HBASE-4633 so that we can verify our
> assumptions.
>
> Thanks
>
> On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl 
> wrote:
>
> > Thanks. Now the question is: How many connection threads do we have?
> >
> > I think there is one per regionserver, which would indeed be a problem.
> > Need to look at the code again (I'm only partially familiar with the
> > client code).
> >
> > Either the client should chunk (like the server does), or there should be
> > a limited number of thread that
> > perform IO on behalf of the client (or both).
> >
> > -- Lars
> >
> >
> > - Original Message -
> > From: Gaojinchao 
> > To: "dev@hbase.apache.org" ; lars hofhansl <
> > lhofha...@yahoo.com>
> > Cc: Chenjian ; wenzaohua  >
> > Sent: Saturday, December 3, 2011 11:22 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > This is dump stack.
> >
> >
> > -邮件原件-
> > 发件人: lars hofhansl [mailto:lhofha...@yahoo.com]
> > 发送时间: 2011年12月4日 14:15
> > 收件人: dev@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > Dropping user list.
> >
> > Could you (or somebody) point me to where the client is using NIO?
> > I'm looking at HBaseClient and I do not see references to NIO, also it
> > seems that all work is handed off to
> > separate threads: HBaseClient.Connection, and the JDK will not cache more
> > than 3 direct buffers per thread.
> >
> > It's possible (likely?) that I missed something in the code.
> >
> > Thanks.
> >
> > -- Lars
> >
> > 
> > From: Gaojinchao 
> > To: "u...@hbase.apache.org" ; "
> dev@hbase.apache.org"
> > 
> > Cc: Chenjian ; wenzaohua  >
> > Sent: Saturday, December 3, 2011 7:57 PM
> > Subject: FeedbackRe: Suspected memory leak
> >
> > Thank you for your help.
> >
> > This issue appears to be a configuration problem:
> > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > there doesn't have "full gc", all direct memory can't reclaim.
> > Unfortunately, using GC confiugre parameter of our client doesn't produce
> > any "full gc".
> >
> > This is only a preliminary result,  All tests is running, If have any
> > further results , we will be fed back.
> > Fi

Re: FeedbackRe: Suspected memory leak

2011-12-04 Thread Gaojinchao
I have attached the stack in https://issues.apache.org/jira/browse/HBASE-4633.
I will update our story.


-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年12月5日 7:37
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: FeedbackRe: Suspected memory leak

I looked through TRUNK and 0.90 code but didn't find
HBaseClient.Connection.setParam().
The method should be sendParam().

When I was in China I tried to access Jonathan's post but wasn't able to.

If Jinchao's stack trace resonates with the one Jonathan posted, we should
consider using netty for HBaseClient.

Cheers

On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl  wrote:

> I think HBASE-4508 is unrelated.
> The "connections" I referring to are HBaseClient.Connection objects (not
> HConnections).
> It turns out that HBaseClient.Connection.setParam is actually called
> directly by the client threads, which means we can get
> an unlimited amount of DirectByteBuffers (until we get a full GC).
>
> The JDK will cache 3 per thread with a size necessary to serve the IO. So
> sending some large requests from many thread
> will lead to OOM.
>
> I think that was a related thread that Stack forwarded a while back from
> the asynchbase mailing lists.
>
> Jinchao, could you add a text version (not a png image, please :-) ) of
> this to the jira?
>
>
> -- Lars
>
>
>
> - Original Message -
> From: Ted Yu 
> To: dev@hbase.apache.org; lars hofhansl 
> Cc: Gaojinchao ; Chenjian ;
> wenzaohua 
> Sent: Sunday, December 4, 2011 12:43 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> 0.90.5 hasn't been released.
> Assuming the NIO consumption is related to the number of connections from
> client side, it would help to perform benchmarking on 0.90.5
>
> Jinchao:
> Please attach stack trace to HBASE-4633 so that we can verify our
> assumptions.
>
> Thanks
>
> On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl 
> wrote:
>
> > Thanks. Now the question is: How many connection threads do we have?
> >
> > I think there is one per regionserver, which would indeed be a problem.
> > Need to look at the code again (I'm only partially familiar with the
> > client code).
> >
> > Either the client should chunk (like the server does), or there should be
> > a limited number of thread that
> > perform IO on behalf of the client (or both).
> >
> > -- Lars
> >
> >
> > - Original Message -
> > From: Gaojinchao 
> > To: "dev@hbase.apache.org" ; lars hofhansl <
> > lhofha...@yahoo.com>
> > Cc: Chenjian ; wenzaohua  >
> > Sent: Saturday, December 3, 2011 11:22 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > This is dump stack.
> >
> >
> > -邮件原件-
> > 发件人: lars hofhansl [mailto:lhofha...@yahoo.com]
> > 发送时间: 2011年12月4日 14:15
> > 收件人: dev@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > Dropping user list.
> >
> > Could you (or somebody) point me to where the client is using NIO?
> > I'm looking at HBaseClient and I do not see references to NIO, also it
> > seems that all work is handed off to
> > separate threads: HBaseClient.Connection, and the JDK will not cache more
> > than 3 direct buffers per thread.
> >
> > It's possible (likely?) that I missed something in the code.
> >
> > Thanks.
> >
> > -- Lars
> >
> > 
> > From: Gaojinchao 
> > To: "u...@hbase.apache.org" ; "
> dev@hbase.apache.org"
> > 
> > Cc: Chenjian ; wenzaohua  >
> > Sent: Saturday, December 3, 2011 7:57 PM
> > Subject: FeedbackRe: Suspected memory leak
> >
> > Thank you for your help.
> >
> > This issue appears to be a configuration problem:
> > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > there doesn't have "full gc", all direct memory can't reclaim.
> > Unfortunately, using GC confiugre parameter of our client doesn't produce
> > any "full gc".
> >
> > This is only a preliminary result,  All tests is running, If have any
> > further results , we will be fed back.
> > Finally , I will update our story to issue
> > https://issues.apache.org/jira/browse/HBASE-4633.
> >
> > If our digging is crrect, whether we should set a default value fo

Re: FeedbackRe: Suspected memory leak

2011-12-03 Thread Gaojinchao
This is dump stack.


-邮件原件-
发件人: lars hofhansl [mailto:lhofha...@yahoo.com] 
发送时间: 2011年12月4日 14:15
收件人: dev@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: FeedbackRe: Suspected memory leak

Dropping user list.

Could you (or somebody) point me to where the client is using NIO?
I'm looking at HBaseClient and I do not see references to NIO, also it seems 
that all work is handed off to
separate threads: HBaseClient.Connection, and the JDK will not cache more than 
3 direct buffers per thread.

It's possible (likely?) that I missed something in the code.

Thanks.

-- Lars

____
From: Gaojinchao 
To: "u...@hbase.apache.org" ; "dev@hbase.apache.org" 
 
Cc: Chenjian ; wenzaohua  
Sent: Saturday, December 3, 2011 7:57 PM
Subject: FeedbackRe: Suspected memory leak

Thank you for your help.

This issue appears to be a configuration problem:
1. HBase client uses NIO(socket) API that uses the direct memory.
2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there 
doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using 
GC confiugre parameter of our client doesn't produce any "full gc".

This is only a preliminary result,  All tests is running, If have any further 
results , we will be fed back.
Finally , I will update our story to issue 
https://issues.apache.org/jira/browse/HBASE-4633. 

If our digging is crrect, whether we should set a default value for the 
"-XXMaxDirectMemorySize" to prevent this situation?


Thanks

-邮件原件-
发件人: bijieshan [mailto:bijies...@huawei.com] 
发送时间: 2011年12月2日 15:37
收件人: dev@hbase.apache.org; u...@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: Suspected memory leak

Thank you all. 
I think it's the same problem with the link provided by Stack. Because the 
heap-size is stabilized, but the non-heap size keep growing. So I think not the 
problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records 
contains the info like below:
"|www.hostname02087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||460|||Agent";
"BBZHtable_UFDR_058,048342220093168-02570"


Jieshan.

-邮件原件-
发件人: Kihwal Lee [mailto:kih...@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; u...@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a 
bug/bugs in the CMS GC, collection may not happen and the heap can grow really 
big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for 
details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket 
related objects were being collected properly. This option forces the 
concurrent marker to be one thread. This was for HDFS, but I think the same 
applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack"  wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: 
https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack  


FeedbackRe: Suspected memory leak

2011-12-03 Thread Gaojinchao
Thank you for your help.

This issue appears to be a configuration problem:
1. HBase client uses NIO(socket) API that uses the direct memory.
2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there 
doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using 
GC confiugre parameter of our client doesn't produce any "full gc".
   
This is only a preliminary result,  All tests is running, If have any further 
results , we will be fed back.
Finally , I will update our story to issue 
https://issues.apache.org/jira/browse/HBASE-4633.

If our digging is crrect, whether we should set a default value for the 
"-XXMaxDirectMemorySize" to prevent this situation?


Thanks

-邮件原件-
发件人: bijieshan [mailto:bijies...@huawei.com] 
发送时间: 2011年12月2日 15:37
收件人: dev@hbase.apache.org; u...@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: Suspected memory leak

Thank you all. 
I think it's the same problem with the link provided by Stack. Because the 
heap-size is stabilized, but the non-heap size keep growing. So I think not the 
problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records 
contains the info like below:
"|www.hostname02087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||460|||Agent"
"BBZHtable_UFDR_058,048342220093168-02570"


Jieshan.

-邮件原件-
发件人: Kihwal Lee [mailto:kih...@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; u...@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a 
bug/bugs in the CMS GC, collection may not happen and the heap can grow really 
big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for 
details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket 
related objects were being collected properly. This option forces the 
concurrent marker to be one thread. This was for HDFS, but I think the same 
applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack"  wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: 
https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack



Re: Status of 0.92RC

2011-11-25 Thread Gaojinchao
I use Suse? Do I need try ?

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年11月26日 11:15
收件人: dev@hbase.apache.org
抄送: lars hofhansl
主题: Re: Status of 0.92RC

On Fri, Nov 25, 2011 at 6:16 PM, Ted Yu  wrote:
> I then looped TestHCM 4 times and there was no test failure.
>

Its fine on mac.  On ubuntu:

---
Test set: org.apache.hadoop.hbase.client.TestHCM
---
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
757.578 sec <<< FAILURE!
testClosing(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed:
35.34 sec  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at org.junit.Assert.assertFalse(Assert.java:79)
at org.apache.hadoop.hbase.client.TestHCM.testClosing(TestHCM.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)



Line numbers are off because I'm messing.  Its saying connection 1 is
closed if I test it just after creating it.

St.Ack

> On Fri, Nov 25, 2011 at 5:39 PM, Ted Yu  wrote:
>
>> I looped TestHCM#testClosing 5 times on MacBook and didn't see test
>> failure.
>>
>> Stack:
>> Can you share the test output ?
>>
>> Thanks
>>
>>
>> On Fri, Nov 25, 2011 at 5:04 PM, lars hofhansl wrote:
>>
>>> I added testClosing as part of HBASE-4805, I'll have a look as soon as I
>>> get a chance.
>>>
>>>
>>>
>>> 
>>>  From: Stack 
>>> To: HBase Dev List 
>>> Sent: Friday, November 25, 2011 2:12 PM
>>> Subject: Status of 0.92RC
>>>
>>> I'm having a little difficulty getting all tests to pass.  On
>>> linux/ubuntu, TestHCM (testClosing strange issue) and TestReplication
>>> are failing for me.  On mac osx, it'll build without fail about 50% of
>>> the time.  I'd like to make it so tests pass all the time before
>>> cutting the RC.  Thats what I'm at these times.
>>>
>>> Also, 0.92 build on jenkins has been turned off by Apache
>>> Infrastructure.  It was hanging.  Its done this in the past too and
>>> when it hangs it requires a jenkins reboot which doesn't make Apache
>>> Infrastructure team too happy.  The hang looks to me like a Jenkins
>>> bug because build hangs before we even checkout src.  Am trying to see
>>> what can be done to get it going again but thats the story at the mo.
>>>
>>> St.Ack
>>>
>>
>>
>


Re: Build failed in Jenkins: HBase-TRUNK-security #7

2011-11-25 Thread Gaojinchao
About 
testMetaRebuild(org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase)
I filed a issue and gave a patch? 
Please review, see whether the method makes sense?

-邮件原件-
发件人: Gaojinchao [mailto:gaojinc...@huawei.com] 
发送时间: 2011年11月25日 8:59
收件人: dev@hbase.apache.org
主题: Re: Build failed in Jenkins: HBase-TRUNK-security #7

Yes, The same reason as I have saw.

-邮件原件-
发件人: Jonathan Hsieh [mailto:j...@cloudera.com] 
发送时间: 2011年11月24日 22:32
收件人: dev@hbase.apache.org
主题: Re: Build failed in Jenkins: HBase-TRUNK-security #7

Gaojinchao,

I think there might be a general class of flakyness having to do with
region assignment/opening not being synchronous.

Do you think this might be related?
https://issues.apache.org/jira/browse/HBASE-4852

The root of the problem in HBASE-4852 is that even though open/assign has
returned from the master/admin point of view, it may not have completed
opening the region on the region server.  I'm guessing there might be
something to ensure regions are loaded before continuing that I just
haven't found yet...

Thanks,
Jon.

On Thu, Nov 24, 2011 at 4:38 AM, Gaojinchao  wrote:

> Failed tests:
> testMetaRebuild(org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase):
> expected:<[]> but
>
> It seems the same reason as the first issue, because region didn't finish
> opening.
> We may need wait the RIT doesn't have Regions when cluseter is restarted.
>
>
>
> -邮件原件-
> 发件人: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
> 发送时间: 2011年11月24日 18:19
> 收件人: dev@hbase.apache.org
> 主题: Build failed in Jenkins: HBase-TRUNK-security #7
>
> See <https://builds.apache.org/job/HBase-TRUNK-security/7/changes>
>
> Changes:
>
> [stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids
>
> [stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT
> TEMPORARILY TO GET TED COMMENT IN
>
> [stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids
>
> [tedyu] HBASE-4739  Master dying while going to close a region can leave
> it in transition
>   forever (Gao Jinchao)
>
> [nspiegelberg] HBASE-4787 Rename HTable thread pool
>
> [tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release - revert, Apache
> maven repository not ready
>
> [tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release
>
> [karthik] HBASE-4772 Utility to Create StoreFiles
>
> [nspiegelberg] HBASE-4785 Improve recovery time of HBase client when a
> region server dies.
>
> [Gary Helmling] HBASE-4857  Recursive loop on KeeperException in
> AuthenticationTokenSecretManager
>
> [ramkrishna] HBASE-4308 Race between RegionOpenedHandler and
> AssignmentManager(Ram)
>
> [nspiegelberg] HBASE-4783 Improve RowCounter to count rows in a specific
> key range.
>
> --
> [...truncated 1917 lines...]
> Running org.apache.hadoop.hbase.regionserver.wal.TestHLogBench
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.922 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 328.263 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestWALActionsListener
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.143 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 165.324 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit
> Tests run: 29, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 256.15 sec
> Running org.apache.hadoop.hbase.regionserver.TestColumnSeeking
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.209 sec
> Running
> org.apache.hadoop.hbase.regionserver.TestReadWriteConsistencyControl
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.087 sec
> Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaMetrics
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.875 sec
> Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaConfigured
> Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.109 sec
> Running org.apache.hadoop.hbase.regionserver.TestKeyValueHeap
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.258 sec
> Running org.apache.hadoop.hbase.regionserver.TestRpcMetrics
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.278 sec
> Running org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.778 sec
> Running org.apache.hadoop.hbase.regionserver.TestGetClosestAtOrBefore
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.95 sec

Re: Build failed in Jenkins: HBase-TRUNK-security #7

2011-11-24 Thread Gaojinchao
Yes, The same reason as I have saw.

-邮件原件-
发件人: Jonathan Hsieh [mailto:j...@cloudera.com] 
发送时间: 2011年11月24日 22:32
收件人: dev@hbase.apache.org
主题: Re: Build failed in Jenkins: HBase-TRUNK-security #7

Gaojinchao,

I think there might be a general class of flakyness having to do with
region assignment/opening not being synchronous.

Do you think this might be related?
https://issues.apache.org/jira/browse/HBASE-4852

The root of the problem in HBASE-4852 is that even though open/assign has
returned from the master/admin point of view, it may not have completed
opening the region on the region server.  I'm guessing there might be
something to ensure regions are loaded before continuing that I just
haven't found yet...

Thanks,
Jon.

On Thu, Nov 24, 2011 at 4:38 AM, Gaojinchao  wrote:

> Failed tests:
> testMetaRebuild(org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase):
> expected:<[]> but
>
> It seems the same reason as the first issue, because region didn't finish
> opening.
> We may need wait the RIT doesn't have Regions when cluseter is restarted.
>
>
>
> -邮件原件-
> 发件人: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
> 发送时间: 2011年11月24日 18:19
> 收件人: dev@hbase.apache.org
> 主题: Build failed in Jenkins: HBase-TRUNK-security #7
>
> See <https://builds.apache.org/job/HBase-TRUNK-security/7/changes>
>
> Changes:
>
> [stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids
>
> [stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT
> TEMPORARILY TO GET TED COMMENT IN
>
> [stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids
>
> [tedyu] HBASE-4739  Master dying while going to close a region can leave
> it in transition
>   forever (Gao Jinchao)
>
> [nspiegelberg] HBASE-4787 Rename HTable thread pool
>
> [tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release - revert, Apache
> maven repository not ready
>
> [tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release
>
> [karthik] HBASE-4772 Utility to Create StoreFiles
>
> [nspiegelberg] HBASE-4785 Improve recovery time of HBase client when a
> region server dies.
>
> [Gary Helmling] HBASE-4857  Recursive loop on KeeperException in
> AuthenticationTokenSecretManager
>
> [ramkrishna] HBASE-4308 Race between RegionOpenedHandler and
> AssignmentManager(Ram)
>
> [nspiegelberg] HBASE-4783 Improve RowCounter to count rows in a specific
> key range.
>
> --
> [...truncated 1917 lines...]
> Running org.apache.hadoop.hbase.regionserver.wal.TestHLogBench
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.922 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 328.263 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestWALActionsListener
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.143 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 165.324 sec
> Running org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit
> Tests run: 29, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 256.15 sec
> Running org.apache.hadoop.hbase.regionserver.TestColumnSeeking
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.209 sec
> Running
> org.apache.hadoop.hbase.regionserver.TestReadWriteConsistencyControl
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.087 sec
> Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaMetrics
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.875 sec
> Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaConfigured
> Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.109 sec
> Running org.apache.hadoop.hbase.regionserver.TestKeyValueHeap
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.258 sec
> Running org.apache.hadoop.hbase.regionserver.TestRpcMetrics
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.278 sec
> Running org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.778 sec
> Running org.apache.hadoop.hbase.regionserver.TestGetClosestAtOrBefore
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.95 sec
> Running org.apache.hadoop.hbase.regionserver.TestHRegionInfo
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.554 sec
> Running org.apache.hadoop.hbase.regionserver.TestStoreScanner
> Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.051 sec
> Runnin

Re: Build failed in Jenkins: HBase-TRUNK-security #7

2011-11-24 Thread Gaojinchao
Failed tests:   
testMetaRebuild(org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase): 
expected:<[]> but

It seems the same reason as the first issue, because region didn't finish 
opening.
We may need wait the RIT doesn't have Regions when cluseter is restarted.



-邮件原件-
发件人: Apache Jenkins Server [mailto:jenk...@builds.apache.org] 
发送时间: 2011年11月24日 18:19
收件人: dev@hbase.apache.org
主题: Build failed in Jenkins: HBase-TRUNK-security #7

See 

Changes:

[stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids

[stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT 
TEMPORARILY TO GET TED COMMENT IN

[stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids

[tedyu] HBASE-4739  Master dying while going to close a region can leave it in 
transition
   forever (Gao Jinchao)

[nspiegelberg] HBASE-4787 Rename HTable thread pool

[tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release - revert, Apache maven 
repository not ready

[tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release

[karthik] HBASE-4772 Utility to Create StoreFiles

[nspiegelberg] HBASE-4785 Improve recovery time of HBase client when a region 
server dies.

[Gary Helmling] HBASE-4857  Recursive loop on KeeperException in 
AuthenticationTokenSecretManager

[ramkrishna] HBASE-4308 Race between RegionOpenedHandler and 
AssignmentManager(Ram)

[nspiegelberg] HBASE-4783 Improve RowCounter to count rows in a specific key 
range.

--
[...truncated 1917 lines...]
Running org.apache.hadoop.hbase.regionserver.wal.TestHLogBench
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.922 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 328.263 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestWALActionsListener
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.143 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 165.324 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit
Tests run: 29, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 256.15 sec
Running org.apache.hadoop.hbase.regionserver.TestColumnSeeking
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.209 sec
Running org.apache.hadoop.hbase.regionserver.TestReadWriteConsistencyControl
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.087 sec
Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaMetrics
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.875 sec
Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaConfigured
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.109 sec
Running org.apache.hadoop.hbase.regionserver.TestKeyValueHeap
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.258 sec
Running org.apache.hadoop.hbase.regionserver.TestRpcMetrics
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.278 sec
Running org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.778 sec
Running org.apache.hadoop.hbase.regionserver.TestGetClosestAtOrBefore
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.95 sec
Running org.apache.hadoop.hbase.regionserver.TestHRegionInfo
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.554 sec
Running org.apache.hadoop.hbase.regionserver.TestStoreScanner
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.051 sec
Running org.apache.hadoop.hbase.regionserver.TestSeekOptimizations
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.792 sec
Running org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.625 sec
Running org.apache.hadoop.hbase.regionserver.TestKeyValueScanFixture
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.093 sec
Running org.apache.hadoop.hbase.regionserver.TestStoreFile
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.49 sec
Running org.apache.hadoop.hbase.regionserver.TestKeyValueSkipListSet
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec
Running org.apache.hadoop.hbase.regionserver.TestExplicitColumnTracker
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.593 sec
Running org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.397 sec
Running org.apache.hadoop.hbase.regionserver.TestMemStoreLAB
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.326 sec
Running org.apache.hadoop.hbase.regionserver.TestSplitLogWorker
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.303 sec
Running org.apac

Re: Build failed in Jenkins: HBase-TRUNK-security #7

2011-11-24 Thread Gaojinchao
Tests in error: 
  
testRegionTransitionOperations(org.apache.hadoop.hbase.coprocessor.TestMasterObserver

I made a issue https://issues.apache.org/jira/browse/HBASE-4864

-邮件原件-
发件人: Apache Jenkins Server [mailto:jenk...@builds.apache.org] 
发送时间: 2011年11月24日 18:19
收件人: dev@hbase.apache.org
主题: Build failed in Jenkins: HBase-TRUNK-security #7

See 

Changes:

[stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids

[stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT 
TEMPORARILY TO GET TED COMMENT IN

[stack] HBASE-4853 HBASE-4789 does overzealous pruning of seqids

[tedyu] HBASE-4739  Master dying while going to close a region can leave it in 
transition
   forever (Gao Jinchao)

[nspiegelberg] HBASE-4787 Rename HTable thread pool

[tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release - revert, Apache maven 
repository not ready

[tedyu] HBASE-4856  Upgrade zookeeper to 3.4.0 release

[karthik] HBASE-4772 Utility to Create StoreFiles

[nspiegelberg] HBASE-4785 Improve recovery time of HBase client when a region 
server dies.

[Gary Helmling] HBASE-4857  Recursive loop on KeeperException in 
AuthenticationTokenSecretManager

[ramkrishna] HBASE-4308 Race between RegionOpenedHandler and 
AssignmentManager(Ram)

[nspiegelberg] HBASE-4783 Improve RowCounter to count rows in a specific key 
range.

--
[...truncated 1917 lines...]
Running org.apache.hadoop.hbase.regionserver.wal.TestHLogBench
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.922 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 328.263 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestWALActionsListener
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.143 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 165.324 sec
Running org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit
Tests run: 29, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 256.15 sec
Running org.apache.hadoop.hbase.regionserver.TestColumnSeeking
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.209 sec
Running org.apache.hadoop.hbase.regionserver.TestReadWriteConsistencyControl
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.087 sec
Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaMetrics
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.875 sec
Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaConfigured
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.109 sec
Running org.apache.hadoop.hbase.regionserver.TestKeyValueHeap
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.258 sec
Running org.apache.hadoop.hbase.regionserver.TestRpcMetrics
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.278 sec
Running org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.778 sec
Running org.apache.hadoop.hbase.regionserver.TestGetClosestAtOrBefore
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.95 sec
Running org.apache.hadoop.hbase.regionserver.TestHRegionInfo
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.554 sec
Running org.apache.hadoop.hbase.regionserver.TestStoreScanner
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.051 sec
Running org.apache.hadoop.hbase.regionserver.TestSeekOptimizations
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.792 sec
Running org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.625 sec
Running org.apache.hadoop.hbase.regionserver.TestKeyValueScanFixture
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.093 sec
Running org.apache.hadoop.hbase.regionserver.TestStoreFile
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.49 sec
Running org.apache.hadoop.hbase.regionserver.TestKeyValueSkipListSet
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec
Running org.apache.hadoop.hbase.regionserver.TestExplicitColumnTracker
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.593 sec
Running org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.397 sec
Running org.apache.hadoop.hbase.regionserver.TestMemStoreLAB
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.326 sec
Running org.apache.hadoop.hbase.regionserver.TestSplitLogWorker
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.303 sec
Running org.apache.hadoop.hbase.regionserver.TestMultiColumnScanner
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Tim

Re: Time for 0.90.5

2011-11-21 Thread Gaojinchao
We are also ready to start using the latest snapshot version. :)


-邮件原件-
发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年11月22日 7:40
收件人: dev@hbase.apache.org
主题: Time for 0.90.5

Hey devs,

I know everyone's focusing on 0.92 and all, but I think we need a
0.90.5 to ease some of the support pain. For example, I helped drevell
on IRC today with something that turned out to be a nasty version of
HBASE-4168. Basically:

- Shut down the machine that has .META. (or maybe even -ROOT-)
- Let the master replay the logs
- Once it's done, it reassigns .META. but the CT's can't find it
because they are stuck on NoRouteToHostException (that's HBASE-4168)
- Master is stuck
- In their case it unstuck itself === >30min later === when they
rebooted the machine and the master started getting ConnectionRefused
instead.

There's a lot of other good stuff in there that people might need. At
SU we've been running on a snapshot of 0.90 from late October.

J-D


Re: Build failed in Jenkins: HBase-TRUNK #2439

2011-11-14 Thread Gaojinchao
I don't know why blockheadsize is depending on the JDK version.
I try to further analysis. 

My local this case passed.

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年11月15日 14:42
收件人: dev@hbase.apache.org
主题: Re: Build failed in Jenkins: HBase-TRUNK #2439

On Mon, Nov 14, 2011 at 10:34 PM, Gaojinchao  wrote:
>  Look this comments
>  public void testBlockHeapSize() {
>    // We have seen multiple possible values for this estimate of the heap size
>    // of a ByteBuffer, presumably depending on the JDK version.
>    assertTrue(HFileBlock.BYTE_BUFFER_HEAP_SIZE == 64 ||
>               HFileBlock.BYTE_BUFFER_HEAP_SIZE == 80);
> But in https://issues.apache.org/jira/browse/HBASE-4768
>
> We add some code snippets:
>      assertEquals(80, HFileBlock.BYTE_BUFFER_HEAP_SIZE);
>      long byteBufferExpectedSize =
>          ClassSize.align(ClassSize.estimateBase(buf.getClass(), true)
>              + HFileBlock.HEADER_SIZE + size);

You think that broke sizing BlockHeapSize Gao?   You have a fix?

Thanks,
St.Ack


RE: Build failed in Jenkins: HBase-TRUNK #2439

2011-11-14 Thread Gaojinchao
  Look this comments
  public void testBlockHeapSize() {
// We have seen multiple possible values for this estimate of the heap size
// of a ByteBuffer, presumably depending on the JDK version.
assertTrue(HFileBlock.BYTE_BUFFER_HEAP_SIZE == 64 ||
   HFileBlock.BYTE_BUFFER_HEAP_SIZE == 80);
But in https://issues.apache.org/jira/browse/HBASE-4768

We add some code snippets:
  assertEquals(80, HFileBlock.BYTE_BUFFER_HEAP_SIZE);
  long byteBufferExpectedSize =
  ClassSize.align(ClassSize.estimateBase(buf.getClass(), true)
  + HFileBlock.HEADER_SIZE + size);
-邮件原件-
发件人: Apache Jenkins Server [mailto:jenk...@builds.apache.org] 
发送时间: 2011年11月15日 13:15
收件人: dev@hbase.apache.org
主题: Build failed in Jenkins: HBase-TRUNK #2439

See 

Changes:

[nspiegelberg] HBASE-4768 Per-(table, columnFamily) metrics with configurable 
table name inclusion

Summary: This is an initial version of an HBase trunk diff for per-table/CF
metrics (see JIRA for details). Unit tests mostly pass -- need to look into
TestDistributedLogSplitting. Also still doing cluster testing.

Test Plan: Unit tests, single-node cluster, dev cluster. Need to try bulk-load
map-reduce jobs as well. Observe metrics through JMX.

Reviewers: jgray, nspiegelberg, stack, tedyu, todd, JIRA

Reviewed By: nspiegelberg

CC: nspiegelberg, Liyin, mbautin, tedyu

Differential Revision: 363

[dmeil] HBASE-4786 book.xml,performance.xml adding and reorg of schema info

--
[...truncated 1766 lines...]
Tests run: 44, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.542 sec
Running org.apache.hadoop.hbase.coprocessor.TestMasterObserver
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.964 sec
Running org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.034 sec
Running org.apache.hadoop.hbase.coprocessor.TestRegionObserverStacking
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.663 sec
Running org.apache.hadoop.hbase.coprocessor.TestClassLoading
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.581 sec
Running org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.426 sec
Running 
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithRemove
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.316 sec
Running 
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithRemove
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.124 sec
Running 
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithAbort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.756 sec
Running org.apache.hadoop.hbase.coprocessor.TestWALObserver
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.259 sec
Running org.apache.hadoop.hbase.coprocessor.TestCoprocessorInterface
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.456 sec
Running 
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.536 sec
Running org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.398 sec
Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.209 sec
Running org.apache.hadoop.hbase.TestCompare
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.241 sec
Running org.apache.hadoop.hbase.avro.TestAvroServer
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.773 sec
Running org.apache.hadoop.hbase.avro.TestAvroUtil
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.629 sec
Running org.apache.hadoop.hbase.thrift.TestThriftServer
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.992 sec
Running org.apache.hadoop.hbase.monitoring.TestTaskMonitor
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.222 sec
Running org.apache.hadoop.hbase.monitoring.TestMemoryBoundedLogMessageBuffer
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.101 sec
Running org.apache.hadoop.hbase.regionserver.TestHRegionInfo
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.599 sec
Running org.apache.hadoop.hbase.regionserver.TestMemStore
Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.583 sec
Running org.apache.hadoop.hbase.regionserver.TestHRegion
Tests run: 57, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.815 sec
Running org.apache.hadoop.hbase.regionserver.TestMasterAddressManager
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.53 sec
Running org.apache.hadoop.hbase.regionserver.Test

Re: Proposal: Dev's/Contributors Meetup on November 29th in San Francisco?

2011-11-13 Thread Gaojinchao
Thanks.

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年11月11日 23:25
收件人: dev@hbase.apache.org
主题: Re: Proposal: Dev's/Contributors Meetup on November 29th in San Francisco?

2011/11/10 Gaojinchao :
> After the meeting, if there is a share, people outside USA can get it, it 
> will be better. :)
>

We'll make sure any discussion gets echoed in the mailing list/JIRA.
If the meeting is small, maybe we should try and skype a few of you
in?  (This has never worked well in my experience but could give it
another try).

St.Ack


Re: Proposal: Dev's/Contributors Meetup on November 29th in San Francisco?

2011-11-10 Thread Gaojinchao
After the meeting, if there is a share, people outside USA can get it, it will 
be better. :)


-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年11月11日 8:11
收件人: HBase Dev List
主题: Proposal: Dev's/Contributors Meetup on November 29th in San Francisco?

What do folks think of a devs/contributors get-together on the 29th in
San Francisco (Location TBD but possibly at SU).  I was thinking
something less than a meetup with 'presentations' but more like a
hackathon where we chat some up front on topics such as what should go
into 0.94, when to cut it, 14.5 seconds worth of blue-skying -- for
fun -- before we deep dive into ugly stuff bug-fixing and testing
whatever the 0.92RC is at the time to help with getting the release
out?

I was thinking we could start at 2pm and work on through the evening?

If above is fine, I'll put up a notice on meetup.

Good stuff,
St.Ack


Re: Larger block sizes

2011-11-09 Thread Gaojinchao
We use 640k because my one node has 12T data, we mostly write.

-邮件原件-
发件人: lars hofhansl [mailto:lhofha...@yahoo.com] 
发送时间: 2011年11月10日 3:27
收件人: hbase-dev
主题: Larger block sizes

Did anybody here experiment with larger HBase blocksizes than the default 64k? 
For example 128k, or even 512k or larger.
Seems that for larger cells and scans touching many rows this should help.

I am planning to do some testing, and was wondering if anybody has some already 
some experience.



re: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails on Jenkins

2011-11-03 Thread Gaojinchao
I can reproduce it:

---
 T E S T S
---

---
 T E S T S
---
Running org.apache.hadoop.hbase.master.TestMasterFailover
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 90.954 sec <<< 
FAILURE!

Results :

Failed tests:   
testMasterFailoverWithMockedRITOnDeadRS(org.apache.hadoop.hbase.master.TestMasterFailover):
 region=enabledTable,bbb,1319241846089.6b022df3f7399ee977683c6c5e4be009.

Tests run: 4, Failures: 1, Errors: 0, Skipped: 0

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年11月4日 13:21
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails on 
Jenkins

Please run the test in loop.

I can reproduce the failure on my MacBook.

Gary logged a jira about jmx exceptions. They're non-essential.

Cheers

On Thursday, November 3, 2011, lars hofhansl  wrote:
> When I run that locally (latest trunk) it passes:
>
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.master.TestMasterFailover
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 69.721 sec
>
> Results :
>
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
>
> [INFO]

> [INFO] BUILD SUCCESSFUL
> [INFO]

> [INFO] Total time: 2 minutes 29 seconds
> [INFO] Finished at: Thu Nov 03 22:06:25 PDT 2011
> [INFO] Final Memory: 58M/286M
> [INFO]

>
>
> In the log I see some JMX related exceptions, but their timing did not
> suggest any potentially hanging threads.
>
> (Linux, OpenJDK 1.6 64 bit, needed to set umask to 022)
>
>
> -- Lars
>
>
>
> - Original Message -
> From: Ted Yu 
> To: dev@hbase.apache.org
> Cc:
> Sent: Thursday, November 3, 2011 8:55 PM
> Subject: TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS fails
on Jenkins
>
> Hi,
> Currently TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS <
>
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/105/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/
<
https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestMasterFailover/testMasterFailoverWithMockedRITOnDeadRS/
>>
> consistently fails on 0.92 and TRUNK.
>
> I intended to log a JIRA but https://issues.apache.org is giving me 503
> error.
>
> I briefly went over the code.
> I think after each region is added to regionsThatShouldBeOnline, we should
> log the name of region:
> // Region of enabled on dead server gets closed but not ack'd by
master
> region = enabledAndOnDeadRegions.remove(0);
> regionsThatShouldBeOnline.add(region);
> log("2. expecting " + region.toString() + " to be online: ");
>
> so that if the assertion below fails we know what type of scenario wasn't
> working:
> for (HRegionInfo hri : regionsThatShouldBeOnline) {
>   assertTrue("region=" + hri.getRegionNameAsString(),
> onlineRegions.contains(hri));
> }
>
> From the above mentioned test output I saw a lot of:
>
> 2011-11-03 21:52:58,652 FATAL [Thread-558.logSyncer] wal.HLog(1106):
> Could not sync. Requesting close of hlog
> java.io.IOException: Reflection
> at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:225)
> at
org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1090)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1194)
> at
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1056)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
> at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:223)
> ... 4 more
> Caused by: java.io.IOException: DFSOutputStream is closed
> at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3483)
> at
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at
org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944)
> ... 8 more
>
> Maybe they have something to do with regions stuck in RIT.
>
> Cheers
>
>


Re: TestHCM.testConnectionUniqueness and ConcurrentModificationException

2011-10-23 Thread Gaojinchao
I open a file and verify it.
My local test case always fails. 

---
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.873 sec <<< 
FAILURE!
testConnectionUniqueness(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 
1.978 sec  <<< ERROR!
java.util.ConcurrentModificationException
at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373)
at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:384)
at java.util.AbstractCollection.toArray(AbstractCollection.java:124)
at java.util.ArrayList.(ArrayList.java:131)
at 
org.apache.hadoop.hbase.client.TestHCM.getValidKeyCount(TestHCM.java:136)
at 
org.apache.hadoop.hbase.client.TestHCM.testConnectionUniqueness(TestHCM.java:222)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年10月24日 5:17
收件人: dev@hbase.apache.org
抄送: Bright Fulton
主题: Re: TestHCM.testConnectionUniqueness and ConcurrentModificationException

On Sun, Oct 23, 2011 at 12:43 PM, Ted Yu  wrote:
> getValidKeyCount() is only used in an assertion in createNewConfigurations()
> which is called by a disabled test, testManyNewConnectionsDoesnotOOME().
>
> Looks like we can remove getValidKeyCount() and its references.
>

+1 to removing crud.

St.Ack


re: Another small request re JIRA references in email

2011-10-18 Thread Gaojinchao
+1 This is good work style.

-邮件原件-
发件人: Ramkrishna S Vasudevan [mailto:ramakrish...@huawei.com] 
发送时间: 2011年10月18日 12:07
收件人: dev@hbase.apache.org
主题: RE: Another small request re JIRA references in email


+1 on this Todd. :)
-Original Message-
From: Todd Lipcon [mailto:t...@cloudera.com] 
Sent: Monday, October 17, 2011 10:19 PM
To: dev
Subject: Another small request re JIRA references in email

Hi folks,

First, thanks to everyone for indulging my request last month about
keeping number of commits per JIRA down where possible. It's been
easier to follow development in my opinion.

So now onto another small request: there have been a lot of emails
recently that look like "Hi all. Please review HBASE-12345. Patch is
ready. etc." Could I request that the email also include a few words
of description, so it's easier to follow these messages from the inbox
without having to navigate to the appropriate JIRA? For example, "A
patch for HBASE-1234 (ACID semantics issue) is ready. Please review".

Thanks!

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera



Created a invalid Zk node

2011-10-11 Thread Gaojinchao
The below logs said that we created a invalid zk node when restarted a cluster.
We mistakenly believed that the regions belong to a dead server.

Can I open a issue ?

2011-10-11 05:05:29,127 INFO org.apache.hadoop.hbase.master.HMaster: Meta 
updated status = true
2011-10-11 05:05:29,127 INFO org.apache.hadoop.hbase.master.HMaster: ROOT/Meta 
already up-to date with new HRI.
2011-10-11 05:05:29,151 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
771d63e9327383159553619a4f2dc74f with OFFLINE state
2011-10-11 05:05:29,161 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
3cf860dd323fe6360f571aeafc129f95 with OFFLINE state
2011-10-11 05:05:29,170 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
4065350214452a9d5c55243c734bef08 with OFFLINE state
2011-10-11 05:05:29,178 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
4e81613f82a39fc6e5e89f96e7b3ccc4 with OFFLINE state
2011-10-11 05:05:29,187 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
e21b9e1545a28953aba0098fda5c9cd9 with OFFLINE state
2011-10-11 05:05:29,195 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
5cd9f55eecd43d088bbd505f6795131f with OFFLINE state
2011-10-11 05:05:29,229 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
db5f641452a70b09b85a92970e4198c7 with OFFLINE state
2011-10-11 05:05:29,237 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
a7b20a653919e7f41bfb2ed349af7d21 with OFFLINE state
2011-10-11 05:05:29,253 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Creating (or updating) unassigned node for 
c9385619425f737eab1a6624d2e097a8 with OFFLINE state

// we cleaned the all the zk nodes.
2011-10-11 05:05:29,262 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Clean cluster startup. Assigning userregions
2011-10-11 05:05:29,262 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Deleting any existing unassigned nodes
2011-10-11 05:05:29,367 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Bulk assigning 9 region(s) across 1 server(s), retainAssignment=true
2011-10-11 05:05:29,369 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Timeout-on-RIT=9000
2011-10-11 05:05:29,369 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Bulk assigning 9 region(s) to C3S3,54366,1318323920153
2011-10-11 05:05:29,369 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Bulk assigning done
2011-10-11 05:05:29,371 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
771d63e9327383159553619a4f2dc74f with OFFLINE state
2011-10-11 05:05:29,371 INFO org.apache.hadoop.hbase.master.HMaster: Master has 
completed initialization
2011-10-11 05:05:29,371 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
3cf860dd323fe6360f571aeafc129f95 with OFFLINE state
2011-10-11 05:05:29,371 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
4065350214452a9d5c55243c734bef08 with OFFLINE state
2011-10-11 05:05:29,371 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
4e81613f82a39fc6e5e89f96e7b3ccc4 with OFFLINE state
2011-10-11 05:05:29,371 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
e21b9e1545a28953aba0098fda5c9cd9 with OFFLINE state
2011-10-11 05:05:29,372 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
5cd9f55eecd43d088bbd505f6795131f with OFFLINE state
2011-10-11 05:05:29,372 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
db5f641452a70b09b85a92970e4198c7 with OFFLINE state
2011-10-11 05:05:29,372 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
a7b20a653919e7f41bfb2ed349af7d21 with OFFLINE state
2011-10-11 05:05:29,372 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:58198-0x132f23a9a38 Async create of unassigned node for 
c9385619425f737eab1a6624d2e097a8 with OFFLINE state


Re: Please welcome our newest committer, Lars Hofhansl

2011-10-07 Thread Gaojinchao
Congratulation!

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年10月8日 0:53
收件人: HBase Dev List
主题: Please welcome our newest committer, Lars Hofhansl

One of us!

As per tradition, your first commit should be adding yourself to the
pom.xml Lars in the committers section -- and don't break the build!
(smile)

St.Ack


Re: HBASE-4212

2011-09-29 Thread Gaojinchao
Thanks for you and Ram.

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年9月30日 0:28
收件人: dev@hbase.apache.org
主题: HBASE-4212

Hi,
Jinchao, Ramkrishna and I have verified that the patch for HBASE-4212 is
good.

I plan to commit to 0.90 branch tomorrow.
If you have comments, please share.

Thanks


: Please welcome Ramkrishna S. Vasudevan, our newest hbase committer

2011-09-28 Thread Gaojinchao
Congrats!

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年9月29日 0:15
收件人: HBase Dev List
主题: Please welcome Ramkrishna S. Vasudevan, our newest hbase committer

Please welcome Ramkrishna, our newest hbase committer.  Ram has been
going great guns fixing ugly hbase bugs with a good while now.  I'm
glad he's on board.

Good on you Ram,
St.Ack


Re: maintaining stable HBase build

2011-09-25 Thread Gaojinchao
+1. We should run all test cases before submit it, Then please state this fact 
on
the JIRA as well.

I will do it.

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年9月24日 18:52
收件人: dev@hbase.apache.org
主题: maintaining stable HBase build

Hi,
I want to bring the importance of maintaining stable HBase build to our
attention.
A stable HBase build is important, not just for the next release but also
for authors of the pending patches to verify the correctness of their work.

At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all
blue. Now they're all red.

I don't mind fixing Jenkins build. But if we collectively adopt some good
practice, it would be easier to achieve the goal of having stable builds.

For contributors, I understand that it takes so much time to run whole test
suite that he/she may not have the luxury of doing this - Apache Jenkins
wouldn't do it when you press Submit Patch button.
If this is the case (let's call it scenario A), please use Eclipse (or other
tool) to identify tests that exercise the classes/methods in your patch and
run them. Also clearly state what tests you ran in the JIRA.

If you have a Linux box where you can run whole test suite, it would be nice
to utilize such resource and run whole suite. Then please state this fact on
the JIRA as well.
Considering Todd's suggestion of holding off commit for 24 hours after code
review, 2 hour test run isn't that long.

Sometimes you may see the following (from 0.92 build 18):

Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21

[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1:51:41.797s

You should examine the test summary above these lines and find out
which test(s) hung. For this case it was TestMasterFailover:

Running org.apache.hadoop.hbase.master.TestMasterFailover
Running org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec

I think a script should be developed that parses test output and
identify hanging test(s).

For scenario A, I hope committer would run test suite.
The net effect would be a statement on the JIRA, saying all tests passed.

Your comments/suggestions are welcome.


Re: A issue about region server startup.

2011-09-06 Thread Gaojinchao
26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: hdfsDate
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: hdfsUrl
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: date
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: hdfsRevision
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: user
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: hdfsVersion
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: url
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: MetricsString 
added: version
2011-08-26 07:54:03,418 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2011-08-26 07:54:03,419 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2011-08-26 07:54:03,419 INFO 
org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics: Initialized
2011-08-26 07:54:03,422 DEBUG org.apache.hadoop.hbase.executor.ExecutorService: 
Starting executor service name=RS_OPEN_REGION-linux-kqm6,20020,1314316432465, 
corePoolSize=32, maxPoolSize=32
2011-08-26 07:54:03,422 DEBUG org.apache.hadoop.hbase.executor.ExecutorService: 
Starting executor service name=RS_OPEN_ROOT-linux-kqm6,20020,1314316432465, 
corePoolSize=1, maxPoolSize=1
2011-08-26 07:54:03,422 DEBUG org.apache.hadoop.hbase.executor.ExecutorService: 
Starting executor service name=RS_OPEN_META-linux-kqm6,20020,1314316432465, 
corePoolSize=1, maxPoolSize=1
2011-08-26 07:54:03,422 DEBUG org.apache.hadoop.hbase.executor.ExecutorService: 
Starting executor service name=RS_CLOSE_REGION-linux-kqm6,20020,1314316432465, 
corePoolSize=3, maxPoolSize=3
2011-08-26 07:54:03,422 DEBUG org.apache.hadoop.hbase.executor.ExecutorService: 
Starting executor service name=RS_CLOSE_ROOT-linux-kqm6,20020,1314316432465, 
corePoolSize=1, maxPoolSize=1
2011-08-26 07:54:03,422 DEBUG org.apache.hadoop.hbase.executor.ExecutorService: 
Starting executor service name=RS_CLOSE_META-linux-kqm6,20020,1314316432465, 
corePoolSize=1, maxPoolSize=1
2011-08-26 07:54:03,505 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2011-08-26 07:54:03,563 INFO org.apache.hadoop.http.HttpServer: Port returned 
by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
listener on 20030
2011-08-26 07:54:03,564 INFO org.apache.hadoop.http.HttpServer: 
listener.getLocalPort() returned 20030 
webServer.getConnectors()[0].getLocalPort() returned 20030
2011-08-26 07:54:03,564 INFO org.apache.hadoop.http.HttpServer: Jetty bound to 
port 20030
2011-08-26 07:54:03,564 INFO org.mortbay.log: jetty-6.1.26
2011-08-26 07:54:03,856 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:20030

//init rpc finished and started to provide services 
2011-08-26 07:54:03,858 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
listener on 20020: starting
2011-08-26 07:54:03,858 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
Responder: starting


-邮件原件-
发件人: Ramkrishna S Vasudevan [mailto:ramakrish...@huawei.com] 
发送时间: 2011年9月6日 21:08
收件人: dev@hbase.apache.org
主题: RE: A issue about region server startup.

Hi Gao

That is because of the timeout.period that is set to default value of 30
mins.

But why was the initialization of the service slow ? May be any exception
while trying to open the root region?

If that is the case you see HBASE-4287 is one solution.
Without HBASE-4287 surely it will take 30 mins.
Regards
Ram

-Original Message-----
From: Gaojinchao [mailto:gaojinc...@huawei.com] 
Sent: Tuesday, September 06, 2011 5:27 PM
To: dev@hbase.apache.org
Subject: A issue about region server startup.

In my cluster , I found a issue that root was assigned after 30 minutes when
the cluster start up, because region server startup slowly.

Region server start up :

Step 1: report to master it has started up.

Step 2: init the service



Hmaster start up:

Step 1:Wait for the region server.

Step 2:Assign the root and meta.



Because the preceding section, if the region server initializes service
slowly. Master may throw exception "Server is not running yet".

The root can't be assigned and need wait for 30 minutes.



Is there any good way to modify this bug? Who can give me some suggestions?



The logs:

2011-08-26 07:53:26,065 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: No previous transition
plan was found (or we are ignoring an existing plan) for -ROOT-,,0.70236052
so generated a random one; hri=-ROOT-,,0.70236052, src=,
dest=linux-kqm6,20020,1314316432465; 1 (online=1, exclude=null) available
servers

2011-08-26 07:53:26,065 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
-ROOT-,,0.70236052 to linux-kqm6,20020,1314316432465

2011-08-26 07:53:26,065 DEBUG org.apache.hadoop.hbase.master.ServerManager:
New connection to linux-kqm6,20020,13143164

A issue about region server startup.

2011-09-06 Thread Gaojinchao
In my cluster , I found a issue that root was assigned after 30 minutes when 
the cluster start up, because region server startup slowly.

Region server start up :

Step 1: report to master it has started up.

Step 2: init the service



Hmaster start up:

Step 1:Wait for the region server.

Step 2:Assign the root and meta.



Because the preceding section, if the region server initializes service slowly. 
Master may throw exception "Server is not running yet".

The root can't be assigned and need wait for 30 minutes.



Is there any good way to modify this bug? Who can give me some suggestions?



The logs:

2011-08-26 07:53:26,065 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
-ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052, src=, 
dest=linux-kqm6,20020,1314316432465; 1 (online=1, exclude=null) available 
servers

2011-08-26 07:53:26,065 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region -ROOT-,,0.70236052 to linux-kqm6,20020,1314316432465

2011-08-26 07:53:26,065 DEBUG org.apache.hadoop.hbase.master.ServerManager: New 
connection to linux-kqm6,20020,1314316432465

2011-08-26 07:53:33,251 WARN 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
RemoteException connecting to RS

org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet

 at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)



 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)

 at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)

 at $Proxy6.getProtocolVersion(Unknown Source)

 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)

 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)

 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)

 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)

 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969)

 at 
org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:620)

 at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:555)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1043)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1304)

 at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:431)

 at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388)

 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)

2011-08-26 07:53:33,254 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Failed assignment of -ROOT-,,0.70236052 to 
serverName=linux-kqm6,20020,1314316432465, load=(requests=0, regions=0, 
usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0

org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet

 at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)



 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)

 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

 at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)

 at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:977)

 at 
org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:620)

 at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:555)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1043)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838)

 at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1304)

 at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:431)

 at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(

Re: New HBase Logo

2011-09-05 Thread Gaojinchao
I agree, there are so many controversy in the community. 

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年9月5日 22:13
收件人: dev@hbase.apache.org
主题: Re: New HBase Logo

May we pay more attention to http://s.apache.org/x4 ?

As HBase matures, we can get new logo.

On Sun, Sep 4, 2011 at 10:02 PM, Ravi Veeramachaneni <
ravi.veeramachan...@gmail.com> wrote:

> Stack,
>
> Just curious, can you share the rest of the contenders?
>
> Ravi
>
> On Thu, Sep 1, 2011 at 12:30 PM, Gary Helmling 
> wrote:
>
> > Thanks Stack for organizing and herding this along, and thanks
> StumbleUpon
> > for sponsoring it!
> >
> >
> > On Wed, Aug 31, 2011 at 9:30 PM, Stack  wrote:
> >
> > > Your PMC voted the following as the new HBase logo:
> > >
> > >  https://issues.apache.org/jira/secure/attachment/12492477/01.jpg
> > >
> > > It will replace the now retired bass clef that has served us well down
> > > through the years.
> > >
> > > We hope you like it.
> > >
> > > The process by which we came up with this design was long and
> > > torturous.  Hopefully we will not have to ever repeat it.  At one time
> > > the thought was to have the broad hbase community vote on the logo but
> > > getting your PMC just to agree on a small set of contenders was like
> > > pulling teeth; painful all around (engineers and design == not such a
> > > good mix).  May you all forgive us for keeping the decision committee
> > > small.
> > >
> > > We'll roll it out in the next day or so.
> > >
> > > A new website redesign is also on its way.  Hopefully that will show
> > > up soon too.  We'll keep you posted.
> > >
> > > Yours,
> > > St.Ack
> > >
> > > P.S. Thank you to dezinden on 99designs for the winning logo
> > > P.P.S Thanks to StumbleUpon for forking over the dollars to fund the
> > > 99designs competition
> > >
> >
>


About failover

2011-08-28 Thread Gaojinchao
version: Trunk.
When we startup a cluster, processDeadServers shouldn't be called,
otherwise we will create a lots of unuseful zk nodes.

As follows:
void joinCluster() throws IOException, KeeperException, InterruptedException {


Map>> deadServers =
  rebuildUserRegions();

//when startup the cluster,all regions are considered as dead ones.
//we will create a offline zk node for each region. But these nodes are cleared 
in function processRegionsInTransition
processDeadServers(deadServers);

//Determine whether it is failover. If it is not , we will clean all the zk 
nodes.
processRegionsInTransition(deadServers);
  }




HBASE-4124 & HBASE-4212

2011-08-22 Thread Gaojinchao
Your review/comment on HBASE-4124 & HBASE-4212 is welcome.

Thanks.



Re: Build failed in Jenkins: HBase-TRUNK #2116

2011-08-15 Thread Gaojinchao
You are right. I use root do it. 

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年8月16日 11:59
收件人: dev@hbase.apache.org
主题: Re: Build failed in Jenkins: HBase-TRUNK #2116

How do you mean Gao?  If the config is that the jenkins user has a max
of 1024 open files on the machine, unless we are root, there is no way
for us to up the ulimit.  Or do I misunderstand?

Thanks,
St.Ack

2011/8/15 Gaojinchao :
> Sorry. I mean we can modify the startup script so that the process can 
> inherit the property. Rather than by modifying the parameters of the 
> operating system.
>
>
> -邮件原件-
> 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
> 发送时间: 2011年8月16日 11:35
> 收件人: dev@hbase.apache.org
> 主题: Re: Build failed in Jenkins: HBase-TRUNK #2116
>
> Because I think the ceiling is default 1024 and though it says
> unlimited, its just 1024 (as it is stock linux IIRC).
> St.Ack
>
> 2011/8/15 Gaojinchao :
>> Why don't try to use ulimit -SHn in test case.
>>
>> -邮件原件-
>> 发件人: Ted Yu [mailto:yuzhih...@gmail.com]
>> 发送时间: 2011年8月16日 11:10
>> 收件人: dev@hbase.apache.org
>> 主题: Fwd: Build failed in Jenkins: HBase-TRUNK #2116
>>
>> From:
>> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testOrphanLogCreation/
>>
>> Caused by: java.io.IOException: Too many open files
>>at sun.nio.ch.IOUtil.initPipe(Native Method)
>>at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
>>
>> FYI
>>
>


re: Build failed in Jenkins: HBase-TRUNK #2116

2011-08-15 Thread Gaojinchao
Sorry. I mean we can modify the startup script so that the process can inherit 
the property. Rather than by modifying the parameters of the operating system.


-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年8月16日 11:35
收件人: dev@hbase.apache.org
主题: Re: Build failed in Jenkins: HBase-TRUNK #2116

Because I think the ceiling is default 1024 and though it says
unlimited, its just 1024 (as it is stock linux IIRC).
St.Ack

2011/8/15 Gaojinchao :
> Why don't try to use ulimit -SHn in test case.
>
> -邮件原件-
> 发件人: Ted Yu [mailto:yuzhih...@gmail.com]
> 发送时间: 2011年8月16日 11:10
> 收件人: dev@hbase.apache.org
> 主题: Fwd: Build failed in Jenkins: HBase-TRUNK #2116
>
> From:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testOrphanLogCreation/
>
> Caused by: java.io.IOException: Too many open files
>at sun.nio.ch.IOUtil.initPipe(Native Method)
>at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
>
> FYI
>


Re: Build failed in Jenkins: HBase-TRUNK #2116

2011-08-15 Thread Gaojinchao
Why don't try to use ulimit -SHn in test case.

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年8月16日 11:10
收件人: dev@hbase.apache.org
主题: Fwd: Build failed in Jenkins: HBase-TRUNK #2116

From:
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testOrphanLogCreation/

Caused by: java.io.IOException: Too many open files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)

FYI


Re: A question about Master failover

2011-08-09 Thread Gaojinchao
Thanks your reply.

I am reading the code about hmaster failover. Process.  It seems a bug.
I will try to make this case and confirm it. 

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年8月9日 17:20
收件人: dev@hbase.apache.org
主题: Re: A question about Master failover

Can you show us the log snippet containing the stack trace ?

Thanks



On Aug 9, 2011, at 2:15 AM, Gaojinchao  wrote:

> 
> In function rebuildUserRegions,  The region that regionLocation is null is 
> put into regions set. It seems a bug.
> It will throw exception when DisableTableHandler processes it.  Skipping this 
> region seems better.
> 
> code:
> if (regionLocation == null) {
>// Region not being served, add to region map with no assignment
>// If this needs to be assigned out, it will also be in ZK as RIT
>// add if the table is not in disabled state
>if (false == checkIfRegionBelongsToDisabled(regionInfo)) {
>  this.regions.put(regionInfo, null);
>}
>if (checkIfRegionBelongsToDisabling(regionInfo)) { 
> //Should we skip this region?
>  disablingTables.add(disablingTableName);
>}
> } else if (!serverManager.isServerOnline(regionLocation.getServerName())) {


A question about Master failover

2011-08-09 Thread Gaojinchao

In function rebuildUserRegions,  The region that regionLocation is null is put 
into regions set. It seems a bug.
It will throw exception when DisableTableHandler processes it.  Skipping this 
region seems better.

code:
if (regionLocation == null) {
// Region not being served, add to region map with no assignment
// If this needs to be assigned out, it will also be in ZK as RIT
// add if the table is not in disabled state
if (false == checkIfRegionBelongsToDisabled(regionInfo)) {
  this.regions.put(regionInfo, null);
}
if (checkIfRegionBelongsToDisabling(regionInfo)) { 
//Should we skip this region?
  disablingTables.add(disablingTableName);
}
} else if (!serverManager.isServerOnline(regionLocation.getServerName())) {


Re: HBASE-4142?

2011-07-26 Thread Gaojinchao
In my experience, some region in list may be blocked. It affects the 
throughput.  Is it all right ?

-邮件原件-
发件人: Doug Meil [mailto:doug.m...@explorysmedical.com] 
发送时间: 2011年7月27日 10:15
收件人: dev@hbase.apache.org
主题: HBASE-4142?

Hi there-

I just saw this in the build message…

 HBASE-4142 Advise against large batches in javadoc for HTable#put(List)

… And I was curious as to why this was a bad thing.  We do this and it actually 
is quite helpful (in concert with an internal utility class that later became 
HTableUtil).



Doug Meil
Chief Software Architect, Explorys
doug.m...@explorys.com



Create table threw NullPointerException

2011-07-14 Thread Gaojinchao
It happened in latest branch 0.90. but I can't reproduce it.

It seems using api getHRegionInfoOrNull is better or check the input parameter 
before call getHRegionInfo.

Code:
  public static Writable getWritable(final byte [] bytes, final Writable w)
  throws IOException {
return getWritable(bytes, 0, bytes.length, w);
  }
return getWritable(bytes, 0, bytes.length, w);  // It seems input parameter 
bytes is null

logs:
11/07/15 10:15:42 INFO zookeeper.ClientCnxn: Socket connection established to 
C4C3.site/157.5.100.3:2181, initiating session
11/07/15 10:15:42 INFO zookeeper.ClientCnxn: Session establishment complete on 
server C4C3.site/157.5.100.3:2181, sessionid = 0x2312b8e3f72, negotiated 
timeout = 18
[INFO] Create : ufdr111 222!
[INFO] Create : ufdr111 start!
java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
at 
org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
at 
org.apache.hadoop.hbase.client.HBaseAdmin$1.processRow(HBaseAdmin.java:306)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:190)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:73)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:325)
at createTable.main(createTable.java:96)


Re: We should push out a 0.90.4?

2011-07-13 Thread Gaojinchao
I have been using the latest branch for my test cluster and verifying it.

-邮件原件-
发件人: Joey Echeverria [mailto:j...@cloudera.com] 
发送时间: 2011年7月14日 5:50
收件人: dev@hbase.apache.org
主题: Re: We should push out a 0.90.4?

On Wed, Jul 13, 2011 at 1:54 PM, Stack  wrote:
> HBASE-3872 has a patch (needs review) and once it goes in I think we
> have enough meat for a 0.90.4.

+1

-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


A question about zookeeper.session.timeout

2011-07-12 Thread Gaojinchao
In hbase book, It recommends the zookeeper.session.timeout is default 60s.  
but, the default value is configured for 18000 in Hbase-default.xml.

Should we modify the Hbase-default.xml or book ?


Hbase Book:
13.6.2.6. ZooKeeper SessionExpired events

If you wish to increase the session timeout, add the following to your 
hbase-site.xml to increase the timeout from the default of 60 seconds to 120 
seconds.

zookeeper.session.timeout
120


hbase.zookeeper.property.tickTime
6000


Be aware that setting a higher timeout means that the regions served by a 
failed RegionServer will take at least that amount of time to be transfered to 
another RegionServer. For a production system serving live requests, we would 
instead recommend setting it lower than 1 minute and over-provision your 
cluster in order the lower the memory load on each machines (hence having less 
garbage to collect per machine).



Hbase-default.xml:
  
zookeeper.session.timeout
18
ZooKeeper session timeout.
  HBase passes this to the zk quorum as suggested maximum time for a
  session.  See 
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
  "The client sends a requested timeout, the server responds with the
  timeout that it can give the client. "
  In milliseconds.

  


Re: Hmaster crashes caused by splitting log.

2011-06-23 Thread Gaojinchao
Thanks Ted.

I didn't see the log . I have filed a issue(HBASE-4028)) and uploaded the log.
I will make a patch and please review it.

-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年6月24日 13:24
收件人: dev@hbase.apache.org
主题: Re: Hmaster crashes caused by splitting log.

In doRun():
try {
  writeBuffer(buffer);
} finally {
  entryBuffers.doneWriting(buffer);
}
totalBuffered being negative implies that there were exceptions in
writeBuffer().
Did you see the following in log ?
e = RemoteExceptionHandler.checkIOException(e);
LOG.fatal(this.getName() + " Got while writing log entry to log",
e);

Thanks

On Thu, Jun 23, 2011 at 10:10 PM, Gaojinchao  wrote:

> I am some doubt. It seems thrown object isn't null ,But , why the top can't
> catch?
> My guess is Thread re-entrant and add some logs. It show that value of
> totalBuffered is -565832
>
>
> hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used
> -540664
> hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used
> -540664release size68872
> hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used
> -565832
>
>
>
>


Re: Hmaster crashes caused by splitting log.

2011-06-23 Thread Gaojinchao
I am some doubt. It seems thrown object isn't null ,But , why the top can't 
catch?  
My guess is Thread re-entrant and add some logs. It show that value of 
totalBuffered is -565832


hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -540664
hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used 
-540664release size68872
hbase-root-master-157-5-111-21.log:2011-06-24 10:29:52,119 WARN 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: gjc:release Used -565832





Re: Hmaster crashes caused by splitting log.

2011-06-22 Thread Gaojinchao
Thanks, You are right.  :) 

I will test it.


-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年6月23日 11:00
收件人: dev@hbase.apache.org
主题: Re: Hmaster crashes caused by splitting log.

I guess you have seen this call directly below the following code snippet:
  checkForErrors();
I think checking thrown should be kept in the condition.

I assume we can lift the above call before dataAvailable.notifyAll()

My 2 cents.

On Wed, Jun 22, 2011 at 7:00 PM, Gaojinchao  wrote:

>
> Because Master usually uses little memory. So its
> memory is 4G.
>
> DFS block is 256M and hbase.regionserver.maxlogs is 32. One region server
> can save max 8G Hlog.
>
> In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G
> when one region server crashed.
>
> I dug it and found the flow control does not work when write thread is
> normal.
>
>
> // If we crossed the chunk threshold, wait for more space to be available
>  synchronized (dataAvailable) {
>while (totalBuffered > maxHeapUsage && thrown == null) {
>  LOG.debug("Used " + totalBuffered + " bytes of buffered edits,
> waiting for IO threads...");
>  dataAvailable.wait(3000);
>}
>dataAvailable.notifyAll();
>  }
>
> If the code is below. It seems better.
>
> // If we crossed the chunk threshold, wait for more space to be available
>  synchronized (dataAvailable) {
>while (totalBuffered > maxHeapUsage) {
>  LOG.debug("Used " + totalBuffered + " bytes of buffered edits,
> waiting for IO threads...");
>  dataAvailable.wait(3000);
>}
>dataAvailable.notifyAll();
>  }
>
>
>
>


Hmaster crashes caused by splitting log.

2011-06-22 Thread Gaojinchao

Because Master usually uses little memory. So its memory is 
4G.

DFS block is 256M and hbase.regionserver.maxlogs is 32. One region server can 
save max 8G Hlog.

In my performance cluster(0.90.3), The Hmaster memory from 100 M up to 4G when 
one region server crashed.

I dug it and found the flow control does not work when write thread is normal.


// If we crossed the chunk threshold, wait for more space to be available
  synchronized (dataAvailable) {
while (totalBuffered > maxHeapUsage && thrown == null) {
  LOG.debug("Used " + totalBuffered + " bytes of buffered edits, 
waiting for IO threads...");
  dataAvailable.wait(3000);
}
dataAvailable.notifyAll();
  }

If the code is below. It seems better.

// If we crossed the chunk threshold, wait for more space to be available
  synchronized (dataAvailable) {
while (totalBuffered > maxHeapUsage) {
  LOG.debug("Used " + totalBuffered + " bytes of buffered edits, 
waiting for IO threads...");
  dataAvailable.wait(3000);
}
dataAvailable.notifyAll();
  }





Re: HBASE-3855

2011-06-20 Thread Gaojinchao
Sorry , I didn't modify directory. It is lastest version(0.90.4)

Trunk merged this code:
   public final static long FIXED_OVERHEAD = ClassSize.align(
-  ClassSize.OBJECT + (11 * ClassSize.REFERENCE));
+  ClassSize.OBJECT + (12 * ClassSize.REFERENCE));

The Class of MemStore added a member reseekNumKeys. So the overhead should be 
modified for
public final static long FIXED_OVERHEAD = ClassSize.align(
  ClassSize.OBJECT + (11 * ClassSize.REFERENCE) + Bytes.SIZEOF_INT);

The branch also need merge. Thanks



-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年6月21日 0:36
收件人: dev@hbase.apache.org
主题: Re: HBASE-3855

HBASE-3855 was integrated to 0.90 branch on Jun 10th.
Meaning, it's not in 0.90.2

Did TestHeapSize fail ?

Please provide more log/exception from the failed test.

On Mon, Jun 20, 2011 at 3:27 AM, Gaojinchao  wrote:

> My test case failed in branch 0.90. It seems a bug.
>
> MemStore overhead should be
>  public final static long FIXED_OVERHEAD = ClassSize.align(
>  ClassSize.OBJECT + (11 * ClassSize.REFERENCE) + Bytes.SIZEOF_INT);
>
> Please refer to
> /opt/g56562/hbase_svn/host_java/src/0.90.2/target/surefire-reports for the
> individual test results.
>at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
>at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
>at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
>at
> org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
>at
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
>at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:319)
>at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
>at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
>at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
>at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
>at
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
>at
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
>at
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
> Caused by: org.apache.maven.plugin.MojoFailureException: There are test
> failures.
>
>


HBASE-3855

2011-06-20 Thread Gaojinchao
My test case failed in branch 0.90. It seems a bug.

MemStore overhead should be
  public final static long FIXED_OVERHEAD = ClassSize.align(
  ClassSize.OBJECT + (11 * ClassSize.REFERENCE) + Bytes.SIZEOF_INT);

Please refer to 
/opt/g56562/hbase_svn/host_java/src/0.90.2/target/surefire-reports for the 
individual test results.
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:319)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
Caused by: org.apache.maven.plugin.MojoFailureException: There are test 
failures.



Hmaster is NullPointerException

2011-05-27 Thread Gaojinchao
issue:
NullPointerException while hmaster starting.
   java.lang.NullPointerException
 at java.util.TreeMap.getEntry(TreeMap.java:324)
 at java.util.TreeMap.get(TreeMap.java:255)
 at 
org.apache.hadoop.hbase.master.AssignmentManager.addToServers(AssignmentManager.java:1512)
 at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:606)
 at 
org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:214)
 at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:402)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)

// below is my analysis
private void finishInitialization()
throws IOException, InterruptedException, KeeperException
{

/*
In the function, waitForRegionServers return when some regionservers 
checkin.
The regionservers include some regions, but not root-region and 
meta-region.
(the Registered servers are 157-5-111-13, 157-5-111-14, 157-5-111-12)
*/
int regionCount = his.serverManager.waitForRegionServers();

/*
In the function, we can know 
verifyRootRegionLocation/verifyMetaRegionLocation are succeed with the log of 
assigned=0,
and the root-regionserver and meta-regionserver are running.
(The 157-5-111-11 server is running, but not Registered. and server 
include root-region and meta-region.)
*/
assignRootAndMeta();

if (regionCount == 0) {
  LOG.info("Master startup proceeding: cluster startup");
  this.assignmentManager.cleanoutUnassigned();
  this.assignmentManager.assignAllUserRegions();
} else {
  LOG.info("Master startup proceeding: master failover");

  /*
HServerInfo hsi = 
this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation());
In processFailover function, call getHServerInfo return NULL pointer, 
hsi is NULL pointer.
Because the regionserver with meta-region has not Registered.
  */
  this.assignmentManager.processFailover();
}
}

the exception logs:

2011-05-21 14:44:47,973 INFO org.apache.hadoop.hbase.master.ServerManager: 
Waiting on regionserver(s) to checkin
2011-05-21 14:44:49,473 INFO org.apache.hadoop.hbase.master.ServerManager: 
Waiting on regionserver(s) to checkin
2011-05-21 14:44:50,974 INFO org.apache.hadoop.hbase.master.ServerManager: 
Waiting on regionserver(s) to checkin
2011-05-21 14:44:51,281 INFO org.apache.hadoop.hbase.master.ServerManager: 
Registering server=157-5-111-13,20020,1305877624933, regionCount=1543, 
userLoad=true
2011-05-21 14:44:51,722 INFO org.apache.hadoop.hbase.master.ServerManager: 
Registering server=157-5-111-14,20020,1305877627727, regionCount=1507, 
userLoad=true
2011-05-21 14:44:51,774 INFO org.apache.hadoop.hbase.master.ServerManager: 
Registering server=157-5-111-12,20020,1305877626108, regionCount=1521, 
userLoad=true
2011-05-21 14:44:52,474 INFO org.apache.hadoop.hbase.master.ServerManager: 
Waiting on regionserver(s) count to settle; currently=3
2011-05-21 14:44:53,974 INFO org.apache.hadoop.hbase.master.ServerManager: 
Finished waiting for regionserver count to settle; count=3, sleptFor=18000
2011-05-21 14:44:53,974 INFO org.apache.hadoop.hbase.master.ServerManager: 
Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of 
regions out on cluster=4571
2011-05-21 14:44:53,978 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
Log folder 
hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161 doesn't 
belong to a known region server, splitting
2011-05-21 14:44:53,991 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 2 hlog(s) in 
hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,1305875930161
2011-05-21 14:44:53,992 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-0,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-2,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-3,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-1,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-6,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-5,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-4,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread 
Thread[WriterThread-8,5,main]: starting
2011-05-21 14:44:53,993 DEBUG 
org.apache.hadoop.hbase.regionserver.

It seems a little bugs for log4j.properties

2011-05-19 Thread Gaojinchao
Misspelt:
log4j.threshhold=ALL should be log4j.threshold=ALL.



Re: about HBaseAdmin

2011-05-18 Thread Gaojinchao
In my case:
Hbase cluster and HDFS cluster share the Zk cluster(so , It is case "only when 
you have
30 connections" or is lower than 2000)

It seems this code has some issue:
  this.conf = HBaseConfiguration.create(c); // new a instance in 0.90.3 verison

In function getConnection, It creates a new HConnectionImplementation
So it is closed when HBaseAdmin finish. It looks better.

public static HConnection getConnection(Configuration conf)
  throws ZooKeeperConnectionException {
HConnectionImplementation connection;
synchronized (HBASE_INSTANCES) {
  connection = HBASE_INSTANCES.get(conf);// It uses object hashcode. 
  if (connection == null) {
connection = new HConnectionImplementation(conf);
HBASE_INSTANCES.put(conf, connection);
  }
}
return connection;
  }


-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年5月19日 12:23
收件人: dev@hbase.apache.org
抄送: Chenjian
主题: Re: about HBaseAdmin

Gao:

So, each time you create an HBaseAdmin with a shared Configuration,
you get the below exception?  Or not every time but only when you have
30 connections?  What do you think changed things in 0.90.3?  Was it "
  HBASE-3734  HBaseAdmin creates new configurations in
getCatalogTracker"

Thanks,
St.Ack



On Wed, May 18, 2011 at 6:48 PM, Gaojinchao  wrote:
> The api HBaseAdmin has modified, So we should add some introduce:
>
> If new a instance , it needs delete connection.
>
> eg:
> HBaseAdmin hba = new HBaseAdmin(conf);
> ...
>
> HConnectionManager.deleteConnection(hba.getConfiguration(), false);
>
>
>
> public HBaseAdmin (Configuration c)
>
>  throws MasterNotRunningException, ZooKeeperConnectionException {
>
>    this.conf = HBaseConfiguration.create(c);                           // new 
> conf, so It will create a new connection
>
>    this.connection = HConnectionManager.getConnection(this.conf);
>
>    this.pause = this.conf.getLong("hbase.client.pause", 1000);
>
>    this.numRetries = this.conf.getInt("hbase.client.retries.number", 10);
>
>    this.retryLongerMultiplier = 
> this.conf.getInt("hbase.client.retries.longer.multiplier", 10);
>
>    this.connection.getMaster();
>
>  }
>
>
>
> In my cluster.
>
> New HBaseAdmin instance will create a new connection for zk.( we share 
> HBaseConfiguration for multithread, It is ok for 0.90.2)
>
> But, In 0.90.3 throw exception:
>
> checkHtableState happen an exception. begin reconnect. exception 
> info:org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to 
> connect to ZooKeeper but the connection closes immediately. This could be a 
> sign that the server has too many connections (30 is the default). Consider 
> inspecting your ZK server logs for that error and then make sure you are 
> reusing HBaseConfiguration as often as you can. See HTable's javadoc for more 
> information
>
>
>
>


about HBaseAdmin

2011-05-18 Thread Gaojinchao
The api HBaseAdmin has modified, So we should add some introduce:

If new a instance , it needs delete connection.

eg:
HBaseAdmin hba = new HBaseAdmin(conf);
...

HConnectionManager.deleteConnection(hba.getConfiguration(), false);



public HBaseAdmin (Configuration c)

  throws MasterNotRunningException, ZooKeeperConnectionException {

this.conf = HBaseConfiguration.create(c);   // new 
conf, so It will create a new connection

this.connection = HConnectionManager.getConnection(this.conf);

this.pause = this.conf.getLong("hbase.client.pause", 1000);

this.numRetries = this.conf.getInt("hbase.client.retries.number", 10);

this.retryLongerMultiplier = 
this.conf.getInt("hbase.client.retries.longer.multiplier", 10);

this.connection.getMaster();

  }



In my cluster.

New HBaseAdmin instance will create a new connection for zk.( we share 
HBaseConfiguration for multithread, It is ok for 0.90.2)

But, In 0.90.3 throw exception:

checkHtableState happen an exception. begin reconnect. exception 
info:org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to 
connect to ZooKeeper but the connection closes immediately. This could be a 
sign that the server has too many connections (30 is the default). Consider 
inspecting your ZK server logs for that error and then make sure you are 
reusing HBaseConfiguration as often as you can. See HTable's javadoc for more 
information





Re: Release 0.90.3 soon?

2011-05-16 Thread Gaojinchao
I can do some test for 0.90.X if it needs.


-邮件原件-
发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年5月17日 2:31
收件人: dev@hbase.apache.org
主题: Re: Release 0.90.3 soon?

Run the unit tests, try it on a handful of machines, basically just
make sure it's usable.

J-D

On Mon, May 16, 2011 at 11:26 AM, Ted Yu  wrote:
> I assume release manager has to be a committer. So, although I want to help
> ...
>
> On Mon, May 16, 2011 at 11:00 AM, Stack  wrote:
>
>> We cut cut a 0.91 now as Todd suggests but would need a release
>> manager to run the release.
>> St.Ack
>>
>> On Mon, May 16, 2011 at 9:26 AM, Ted Yu  wrote:
>> > Barney Frank's request for help brought me back to this discussion.
>> > When would 0.91 release come ?
>> >
>> > On Wed, May 4, 2011 at 10:31 AM, Ted Yu  wrote:
>> >
>> >> I am looking forward to this 0.91 release.
>> >>
>> >>
>> >> On Wed, May 4, 2011 at 10:23 AM, Todd Lipcon  wrote:
>> >>
>> >>> On Wed, May 4, 2011 at 8:57 AM, Ted Yu  wrote:
>> >>>
>> >>> > When would 0.91 branch be created ?
>> >>> > We should reduce the number of open critical bugs for 0.90 - each
>> such
>> >>> > issue
>> >>> > would soon be integrated to 3 branches after 0.91 branch is created.
>> >>> >
>> >>>
>> >>> Sorry, I should have been clear that I would imagine 0.91 would be a
>> "dev
>> >>> release" series like 0.89. That is to say, there would be no 0.91
>> branch,
>> >>> just a few "snapshot style" releases with minimal pre-release testing
>> to
>> >>> get
>> >>> us in shape for 92. These releases would never have point releases done
>> on
>> >>> top (and thus not need to have changes backported to them one
>> released).
>> >>>
>> >>> Consider it another name for an extended release candidate period for
>> >>> 0.92.
>> >>>
>> >>> -Todd
>> >>>
>> >>>
>> >>> > On Tue, May 3, 2011 at 10:47 PM, Todd Lipcon 
>> wrote:
>> >>> >
>> >>> > > Maybe we should do an 0.91 dev release tout de suite? Folks who
>> want
>> >>> > > 3777 and coprocessors and the other nice stuff can then help us
>> bake
>> >>> > > towards 92 ?
>> >>> > >
>> >>> > > Todd
>> >>> > >
>> >>> > > On Tuesday, May 3, 2011, Stack  wrote:
>> >>> > > > On Tue, May 3, 2011 at 9:21 PM, Ted Yu 
>> wrote:
>> >>> > > >> After discussion below, I wonder if 0.92.0RC would come out
>> before
>> >>> > > 0.90.4
>> >>> > > >> I hope that's the case - we're looking forward to coprocessor
>> which
>> >>> > > wouldn't
>> >>> > > >> be in 0.90.x anyway.
>> >>> > > >>
>> >>> > > >
>> >>> > > > Plan is to put up a 0.92.0 'soon', in the next week or so.  I'm
>> not
>> >>> > > > sure when 0.90.4 will see light of day.  There's a good few
>> blockers
>> >>> > > > and criticals but if a few of us have a go at them, we'll knock
>> it
>> >>> out
>> >>> > > > tout de suite (This is what I'm working on at mo).
>> >>> > > >
>> >>> > > > Good stuff,
>> >>> > > > St.Ack
>> >>> > > >
>> >>> > >
>> >>> > > --
>> >>> > > Todd Lipcon
>> >>> > > Software Engineer, Cloudera
>> >>> > >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Todd Lipcon
>> >>> Software Engineer, Cloudera
>> >>>
>> >>
>> >>
>> >
>>
>


Re: Table can't disable

2011-05-16 Thread Gaojinchao
Thanks, Stack.
You are very busy. :)
I hope to try my best to do something for community.  



-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年5月17日 3:21
收件人: dev@hbase.apache.org
抄送: Chenjian
主题: Re: Table can't disable

On Sat, May 14, 2011 at 7:08 PM, Gaojinchao  wrote:
> I try to make a patch and process it.


OK.  What are you trying to do in the patch?  (Turns out my issue was
different to what you see).



> I want to send a zk message again when close region is timeout.
> I try to reproduce and verify it. But it is difficulty.

Yes, this stuff is tricky.  Its not easy making a test.  Have you seen
TestZKBasedOpenCloseRegion and TestMasterFailover.  These do messing
about with zk state.  Perhaps they help?

> Do you give me some help and review it.
>

Did you change code in the below?  If so, can you send a patch only?
Or better, attach it to a JIRA?  Patch is good because it is the
difference between what is hbase and what you have added.  Otherwise,
it takes a bit of work figuring what you have added.

Thank you Gao,
St.Ack

> Thanks.
>
> case PENDING_CLOSE:
>                LOG.info("Region has been PENDING_CLOSE for too " +
>                    "long, running forced unassign again on region=" +
>                    regionInfo.getRegionNameAsString());
>                  try {
>                    // If the server got the RPC, it will transition the node
>                    // to CLOSING, so only do something here if no node exists
>                    if (!ZKUtil.watchAndCheckExists(watcher,
>                      ZKAssign.getNodeName(watcher, 
> regionInfo.getEncodedName( {
>                      // Queue running of an unassign -- do actual unassign
>                      // outside of the regionsInTransition lock.
>                      unassigns.add(regionInfo);
>                    }
>                    else
>                    {
>                                                                               
>  // I add some code
>                       try {
>                          String node = ZKAssign.getNodeName(watcher,
>                             regionInfo.getEncodedName());
>                          Stat stat = new Stat();
>                          RegionTransitionData data = 
> ZKAssign.getDataNoWatch(watcher,
>                             node, stat);
>                          if (data == null) {
>                             LOG.warn("Data is null, node " + node + " no 
> longer exists");
>                             break;
>                          }
>                          if (data.getEventType() != 
> EventType.RS_ZK_REGION_CLOSED) {
>                             LOG.debug("Region has transitioned to CLOEING, 
> allowing " +
>                                  "watched event handlers to process");
>                             break;
>                          }
>
>                         //In this case, Region server has ClOSED
>                         data = new RegionTransitionData(
>                                 EventType.RS_ZK_REGION_CLOSED, 
> regionInfo.getRegionName(),
>                                    master.getServerName());
>                         if (ZKUtil.setData(watcher, node, data.getBytes(),
>                                   stat.getVersion()+1 )) {
>
>                            // Node is now closed, let's trigger another close 
> handler
>                            LOG.info("Try to finish closed region=" +
>                               regionInfo.getRegionNameAsString() + "again" );
>                         }
>
>                      } catch (KeeperException ke) {
>                         LOG.error("Unexpected ZK exception timing out 
> PENDING_CLOSE region",
>                            ke);
>                         break;
>                        }
>
>                     }
>                  } catch (NoNodeException e) {
>                    LOG.debug("Node no longer existed so not forcing another " 
> +
>                      "unassignment");
>                  } catch (KeeperException e) {
>                    LOG.warn("Unexpected ZK exception timing out a region " +
>                      "close", e);
>                  }
>                  break;
>
> -邮件原件-
> 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
> 发送时间: 2011年5月10日 1:38
> 收件人: dev@hbase.apache.org
> 主题: Re: Table can't disable
>
> I'm looking into this Gao.  Something similar seems to have happened
> here over the w/e.  Let me look at our systems first and then I'l be
> back to you

Re: Thread leak // Hmaster has some warn logs.

2011-05-16 Thread Gaojinchao
Sorry, I make a mistake. Hlog is closed in code.
But. Why is it so much logSyncer thread ?


-邮件原件-
发件人: Gaojinchao 
发送时间: 2011年5月16日 14:45
收件人: dev@hbase.apache.org
主题: Thread leak // Hmaster has some warn logs.

I try to reproduce the issue that socket is close_wait and find a new issue 
thread leak for hmaster process.

jstack 7983:
2011-05-16 13:25:42
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.1-b03 mixed mode):
"IPC Server handler 4 on 6.logSyncer" daemon prio=10 tid=0x7f42f065b800 
nid=0x502e waiting on condition [0x7f42c3358000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:962)
count: 
jstack 7983 | grep 6.logSyncer| wc -l
411

In this case, It seems LogSyncer thread can't be closed
Hlog can't be closed when use api createHRegion.

public static HRegion createHRegion(final HRegionInfo info, final Path rootDir,
final Configuration conf)
  throws IOException {
Path tableDir =
  HTableDescriptor.getTableDir(rootDir, info.getTableDesc().getName());
Path regionDir = HRegion.getRegionDir(tableDir, info.getEncodedName());
FileSystem fs = FileSystem.get(conf);
fs.mkdirs(regionDir);
HRegion region = HRegion.newHRegion(tableDir,
  new HLog(fs, new Path(regionDir, HConstants.HREGION_LOGDIR_NAME), 
 // Hlog can't close when use api createHRegion
  new Path(regionDir, HConstants.HREGION_OLDLOGDIR_NAME), conf),
  fs, conf, info, null);
region.initialize();
return region;
  }

It also find some CLOSE_WAIT. But I am not sure what' up.
I need keep digging.

tcp  227  0 157.5.100.1:6   157.5.100.1:52780   CLOSE_WAIT
tcp1  0 157.5.100.1:6   157.5.100.5:56633   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.4:50772   CLOSE_WAIT
tcp  227  0 157.5.100.1:6   157.5.100.1:59921   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.3:37877   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.5:57556   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.5:44967   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.3:45944   CLOSE_WAIT
C4C1:/opt/hbasetools # netstat -an |grep 6 | grep CLOSE_WAIT | wc -l
140

-邮件原件-
发件人: Gaojinchao 
发送时间: 2011年5月13日 16:59
收件人: u...@hbase.apache.org
主题: Re: Hmaster has some warn logs.

Hi,
I have found a hint. All the socket is close_wait. 
But, what 's up. I need keep digging.

Anyone has experience, Please share it. Thanks.

20020(regin server port)
# netstat -an | grep 20020
tcp   13  0 157.5.100.13:20020  :::*LISTEN  
tcp  178  0 157.5.100.13:20020  157.5.100.11:56169  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:36908  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:38372  CLOSE_WAIT  
tcp   1537168  0 157.5.100.13:20020  157.5.100.6:45643   
ESTABLISHED 
tcp  178  0 157.5.100.13:20020  157.5.100.11:52667  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:48926  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:37150  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:36665  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:36432  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:46395  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:57346  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:47115  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:55164  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:50981  CLOSE_WAIT  
tcp  250  0 157.5.100.13:20020  157.5.100.11:36214  CLOSE_WAIT  

-邮件原件-
发件人: Gaojinchao [mailto:gaojinc...@huawei.com] 
发送时间: 2011年5月13日 8:33
收件人: u...@hbase.apache.org
主题: re: Hmaster has some warn logs.

It can't reproduce, I need dig from logs and code. 

-邮件原件-
发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年5月13日 1:08
收件人: u...@hbase.apache.org
主题: Re: Hmaster has some warn logs.

Seems the master had lots of problems talking to that node... can you
repro? If you jstack, are all the handlers filled?

J-D

2011/5/12 Gaojinchao :
> Thank you for reply.
>
> It seems master had some problem. But I am not sure what 's up.
> I am not familiar with RPC and need keep digging.
>
>
> // the master connetion was ok
> 2011-05-04 04:47:06,267 INFO org.apache.hadoop.hbase.master.CatalogJanitor: 
> Scanned 378 catalog row(s) and gc'd 9 unreferenced parent region(s)
> 2011-05-04 04:52:05,833 INFO o

Thread leak // Hmaster has some warn logs.

2011-05-15 Thread Gaojinchao
I try to reproduce the issue that socket is close_wait and find a new issue 
thread leak for hmaster process.

jstack 7983:
2011-05-16 13:25:42
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.1-b03 mixed mode):
"IPC Server handler 4 on 6.logSyncer" daemon prio=10 tid=0x7f42f065b800 
nid=0x502e waiting on condition [0x7f42c3358000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:962)
count: 
jstack 7983 | grep 6.logSyncer| wc -l
411

In this case, It seems LogSyncer thread can't be closed
Hlog can't be closed when use api createHRegion.

public static HRegion createHRegion(final HRegionInfo info, final Path rootDir,
final Configuration conf)
  throws IOException {
Path tableDir =
  HTableDescriptor.getTableDir(rootDir, info.getTableDesc().getName());
Path regionDir = HRegion.getRegionDir(tableDir, info.getEncodedName());
FileSystem fs = FileSystem.get(conf);
fs.mkdirs(regionDir);
HRegion region = HRegion.newHRegion(tableDir,
  new HLog(fs, new Path(regionDir, HConstants.HREGION_LOGDIR_NAME), 
 // Hlog can't close when use api createHRegion
  new Path(regionDir, HConstants.HREGION_OLDLOGDIR_NAME), conf),
  fs, conf, info, null);
region.initialize();
return region;
  }

It also find some CLOSE_WAIT. But I am not sure what' up.
I need keep digging.

tcp  227  0 157.5.100.1:6   157.5.100.1:52780   CLOSE_WAIT
tcp1  0 157.5.100.1:6   157.5.100.5:56633   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.4:50772   CLOSE_WAIT
tcp  227  0 157.5.100.1:6   157.5.100.1:59921   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.3:37877   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.5:57556   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.5:44967   CLOSE_WAIT
tcp  233  0 157.5.100.1:6   157.5.100.3:45944   CLOSE_WAIT
C4C1:/opt/hbasetools # netstat -an |grep 6 | grep CLOSE_WAIT | wc -l
140

-邮件原件-
发件人: Gaojinchao 
发送时间: 2011年5月13日 16:59
收件人: u...@hbase.apache.org
主题: Re: Hmaster has some warn logs.

Hi,
I have found a hint. All the socket is close_wait. 
But, what 's up. I need keep digging.

Anyone has experience, Please share it. Thanks.

20020(regin server port)
# netstat -an | grep 20020
tcp   13  0 157.5.100.13:20020  :::*LISTEN  
tcp  178  0 157.5.100.13:20020  157.5.100.11:56169  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:36908  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:38372  CLOSE_WAIT  
tcp   1537168  0 157.5.100.13:20020  157.5.100.6:45643   
ESTABLISHED 
tcp  178  0 157.5.100.13:20020  157.5.100.11:52667  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:48926  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:37150  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:36665  CLOSE_WAIT  
tcp  212  0 157.5.100.13:20020  157.5.100.11:36432  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:46395  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:57346  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:47115  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:55164  CLOSE_WAIT  
tcp  178  0 157.5.100.13:20020  157.5.100.11:50981  CLOSE_WAIT  
tcp  250  0 157.5.100.13:20020  157.5.100.11:36214  CLOSE_WAIT  

-邮件原件-
发件人: Gaojinchao [mailto:gaojinc...@huawei.com] 
发送时间: 2011年5月13日 8:33
收件人: u...@hbase.apache.org
主题: re: Hmaster has some warn logs.

It can't reproduce, I need dig from logs and code. 

-邮件原件-
发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年5月13日 1:08
收件人: u...@hbase.apache.org
主题: Re: Hmaster has some warn logs.

Seems the master had lots of problems talking to that node... can you
repro? If you jstack, are all the handlers filled?

J-D

2011/5/12 Gaojinchao :
> Thank you for reply.
>
> It seems master had some problem. But I am not sure what 's up.
> I am not familiar with RPC and need keep digging.
>
>
> // the master connetion was ok
> 2011-05-04 04:47:06,267 INFO org.apache.hadoop.hbase.master.CatalogJanitor: 
> Scanned 378 catalog row(s) and gc'd 9 unreferenced parent region(s)
> 2011-05-04 04:52:05,833 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
> Skipping load balancing.  servers=3 regions=371 average=123.64 
> mostloaded=124 leastloaded=124
>
> //the connection had some problem.
> 2011-05-04 0

re: Table can't disable

2011-05-14 Thread Gaojinchao

Hi, Stack. 

I try to make a patch and process it. 
I want to send a zk message again when close region is timeout.
I try to reproduce and verify it. But it is difficulty.
Do you give me some help and review it. 

Thanks.

case PENDING_CLOSE:
LOG.info("Region has been PENDING_CLOSE for too " +
"long, running forced unassign again on region=" +
regionInfo.getRegionNameAsString());
  try {
// If the server got the RPC, it will transition the node
// to CLOSING, so only do something here if no node exists
if (!ZKUtil.watchAndCheckExists(watcher,
  ZKAssign.getNodeName(watcher, 
regionInfo.getEncodedName( {
  // Queue running of an unassign -- do actual unassign
  // outside of the regionsInTransition lock.
  unassigns.add(regionInfo);
}
else
{

// I add some code 
   try {
  String node = ZKAssign.getNodeName(watcher,
 regionInfo.getEncodedName());
  Stat stat = new Stat();
  RegionTransitionData data = 
ZKAssign.getDataNoWatch(watcher,
 node, stat);
  if (data == null) {
 LOG.warn("Data is null, node " + node + " no 
longer exists");
 break;
  }
  if (data.getEventType() != 
EventType.RS_ZK_REGION_CLOSED) {
 LOG.debug("Region has transitioned to CLOEING, 
allowing " +
  "watched event handlers to process");
 break;
  } 

 //In this case, Region server has ClOSED 
 data = new RegionTransitionData(
 EventType.RS_ZK_REGION_CLOSED, 
regionInfo.getRegionName(),
master.getServerName());
 if (ZKUtil.setData(watcher, node, data.getBytes(),
   stat.getVersion()+1 )) {

// Node is now closed, let's trigger another close 
handler
LOG.info("Try to finish closed region=" +
   regionInfo.getRegionNameAsString() + "again" );
 }

  } catch (KeeperException ke) {
 LOG.error("Unexpected ZK exception timing out 
PENDING_CLOSE region",
ke);
 break;
}

 }
  } catch (NoNodeException e) {
LOG.debug("Node no longer existed so not forcing another " +
  "unassignment");
  } catch (KeeperException e) {
LOG.warn("Unexpected ZK exception timing out a region " +
  "close", e);
  }
  break;

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年5月10日 1:38
收件人: dev@hbase.apache.org
主题: Re: Table can't disable

I'm looking into this Gao.  Something similar seems to have happened
here over the w/e.  Let me look at our systems first and then I'l be
back to you on this below.  Thanks,
St.Ack

On Sun, May 8, 2011 at 4:58 AM, Gaojinchao  wrote:
> Today I test Hbase verison 0.90.3
> I seems like some bugs
> 1, if node exists and node state is RS_ZK_REGION_CLOSED
>  We should call ClosedRegionHandle to close region. In that case the reigon 
> has closed by region server
>
> case PENDING_CLOSE:
>                LOG.info("Region has been PENDING_CLOSE for too " +
>                    "long, running forced unassign again on region=" +
>                    regionInfo.getRegionNameAsString());
>                  try {
>                    // If the server got the RPC, it will transition the node
>                    // to CLOSING, so only do something here if no node exists
>                    if (!ZKUtil.watchAndCheckExists(watcher,
>                      ZKAssign.getNodeName(watcher, 
> regionInfo.getEncodedName( {
>                      // Queue running of an unassign -- do actual unassign
>                      // outside of the regionsInTransition lock.
>                      un

re: Table can't disable

2011-05-14 Thread Gaojinchao
Hi, Stack. 

I try to make a patch and process it. 
I want to send a zk message again when close region is timeout.
I try to reproduce and verify it. But it is difficulty.
Do you give me some help and review it. 

Thanks.

case PENDING_CLOSE:
LOG.info("Region has been PENDING_CLOSE for too " +
"long, running forced unassign again on region=" +
regionInfo.getRegionNameAsString());
  try {
// If the server got the RPC, it will transition the node
// to CLOSING, so only do something here if no node exists
if (!ZKUtil.watchAndCheckExists(watcher,
  ZKAssign.getNodeName(watcher, 
regionInfo.getEncodedName( {
  // Queue running of an unassign -- do actual unassign
  // outside of the regionsInTransition lock.
  unassigns.add(regionInfo);
}
else
{

// I add some code 
   try {
  String node = ZKAssign.getNodeName(watcher,
 regionInfo.getEncodedName());
  Stat stat = new Stat();
  RegionTransitionData data = 
ZKAssign.getDataNoWatch(watcher,
 node, stat);
  if (data == null) {
 LOG.warn("Data is null, node " + node + " no 
longer exists");
 break;
  }
  if (data.getEventType() != 
EventType.RS_ZK_REGION_CLOSED) {
 LOG.debug("Region has transitioned to CLOEING, 
allowing " +
  "watched event handlers to process");
 break;
  } 

 //In this case, Region server has ClOSED 
 data = new RegionTransitionData(
 EventType.RS_ZK_REGION_CLOSED, 
regionInfo.getRegionName(),
master.getServerName());
 if (ZKUtil.setData(watcher, node, data.getBytes(),
   stat.getVersion()+1 )) {

// Node is now closed, let's trigger another close 
handler
LOG.info("Try to finish closed region=" +
   regionInfo.getRegionNameAsString() + "again" );
 }

  } catch (KeeperException ke) {
 LOG.error("Unexpected ZK exception timing out 
PENDING_CLOSE region",
ke);
 break;
}

 }
  } catch (NoNodeException e) {
LOG.debug("Node no longer existed so not forcing another " +
  "unassignment");
  } catch (KeeperException e) {
LOG.warn("Unexpected ZK exception timing out a region " +
  "close", e);
  }
  break;

-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年5月10日 1:38
收件人: dev@hbase.apache.org
主题: Re: Table can't disable

I'm looking into this Gao.  Something similar seems to have happened
here over the w/e.  Let me look at our systems first and then I'l be
back to you on this below.  Thanks,
St.Ack

On Sun, May 8, 2011 at 4:58 AM, Gaojinchao  wrote:
> Today I test Hbase verison 0.90.3
> I seems like some bugs
> 1, if node exists and node state is RS_ZK_REGION_CLOSED
>  We should call ClosedRegionHandle to close region. In that case the reigon 
> has closed by region server
>
> case PENDING_CLOSE:
>                LOG.info("Region has been PENDING_CLOSE for too " +
>                    "long, running forced unassign again on region=" +
>                    regionInfo.getRegionNameAsString());
>                  try {
>                    // If the server got the RPC, it will transition the node
>                    // to CLOSING, so only do something here if no node exists
>                    if (!ZKUtil.watchAndCheckExists(watcher,
>                      ZKAssign.getNodeName(watcher, 
> regionInfo.getEncodedName( {
>                      // Queue running of an unassign -- do actual unassign
>                      // outside of the regionsInTransition lock.
>                      un

Re: [jira] [Updated] (HBASE-3878) Hbase client throws NoSuchElementException

2011-05-12 Thread Gaojinchao
I try to make a patch yesterday. It is same with that by Ted.
I am not sure if the function deleteCachedLocation has some issue because I 
searched it .
I want to ask for advice. Sorry. I will study it :)

Thanks.

-邮件原件-
发件人: Stack [mailto:saint@gmail.com] 
发送时间: 2011年5月13日 9:53
收件人: dev@hbase.apache.org
抄送: dev@hbase.apache.org
主题: Re: [jira] [Updated] (HBASE-3878) Hbase client throws NoSuchElementException

What are you suggesting gao?  You must be specific about what you would have us 
look at because we are bad on inference

Thanks
Stack



On May 12, 2011, at 18:01, Gaojinchao  wrote:

> Hi Ted, Thanks for your patch. 
> 
> I am not familiar with SoftValueSortedMap. 
> Please check this function.
> 
>  void deleteCachedLocation(final byte [] tableName, final byte [] row) {
>  synchronized (this.cachedRegionLocations) {
>SoftValueSortedMap tableLocations =
>getTableLocations(tableName);
>// start to examine the cache. we can only do cache actions
>// if there's something in the cache for this table.
>if (!tableLocations.isEmpty()) {
>  HRegionLocation rl = getCachedLocation(tableName, row);
>  if (rl != null) {
>tableLocations.remove(rl.getRegionInfo().getStartKey());
>if (LOG.isDebugEnabled()) {
>  LOG.debug("Removed " +
>rl.getRegionInfo().getRegionNameAsString() +
>" for tableName=" + Bytes.toString(tableName) +
>" from cache " + "because of " + Bytes.toStringBinary(row));
>}
>  }
>}
>  }
>}
> 
> 
> -邮件原件-
> 发件人: Ted Yu (JIRA) [mailto:j...@apache.org] 
> 发送时间: 2011年5月12日 22:44
> 收件人: Gaojinchao
> 主题: [jira] [Updated] (HBASE-3878) Hbase client throws NoSuchElementException
> 
> 
> [ 
> https://issues.apache.org/jira/browse/HBASE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
> 
> Ted Yu updated HBASE-3878:
> --
> 
>Attachment: 3878.patch
> 
> Added try/catch block to guard against NoSuchElementException.
> 
> We already check possibleRegion not being null.
> 
>> Hbase client throws NoSuchElementException
>> --
>> 
>>Key: HBASE-3878
>>    URL: https://issues.apache.org/jira/browse/HBASE-3878
>>Project: HBase
>> Issue Type: Bug
>> Components: client
>>   Affects Versions: 0.90.2
>>   Reporter: gaojinchao
>>Fix For: 0.90.4
>> 
>>Attachments: 3878.patch
>> 
>> 
>> Soft reference objects, which are cleared at the discretion of the 
>> garbage collector in response to memory demand. 
>> I used ycsb to put data and threw exception.
>>>>>> 
>>>>>> Hbase Code:
>>>>>>// Cut the cache so that we only get the part that could contain
>>>>>>// regions that match our key
>>>>>>SoftValueSortedMap matchingRegions =
>>>>>>  tableLocations.headMap(row);
>>>>>> 
>>>>>>// if that portion of the map is empty, then we're done. otherwise,
>>>>>>// we need to examine the cached location to verify that it is
>>>>>>// a match by end key as well.
>>>>>>if (!matchingRegions.isEmpty()) {
>>>>>>  HRegionLocation possibleRegion =
>>>>>>matchingRegions.get(matchingRegions.lastKey());
>>>>>> 
>>>>>>  ycsb client log:
>>>>>> 
>>>>>>  [java] begin StatusThread run
>>>>>>   [java] java.util.NoSuchElementException
>>>>>>   [java] at java.util.TreeMap.key(TreeMap.java:1206)
>>>>>>   [java] at
>>>> java.util.TreeMap$NavigableSubMap.lastKey(TreeMap.java:1435)
>>>>>>   [java] at
>>>> org.apache.hadoop.hbase.util.SoftValueSortedMap.lastKey(SoftValueSort
>>>> edMap.java:131)
>>>>>>   [java] at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
>>>> tation.getCachedLocation(HConnectionManager.java:841)
>>>>>>   [java] at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
>>>> tation.locateRegionInMeta(HConnectionManager.java:664)
>>>>>>   [java] at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConn

Re: [jira] [Updated] (HBASE-3878) Hbase client throws NoSuchElementException

2011-05-12 Thread Gaojinchao
Hi Ted, Thanks for your patch. 

I am not familiar with SoftValueSortedMap. 
Please check this function.

  void deleteCachedLocation(final byte [] tableName, final byte [] row) {
  synchronized (this.cachedRegionLocations) {
SoftValueSortedMap tableLocations =
getTableLocations(tableName);
// start to examine the cache. we can only do cache actions
// if there's something in the cache for this table.
if (!tableLocations.isEmpty()) {
  HRegionLocation rl = getCachedLocation(tableName, row);
  if (rl != null) {
tableLocations.remove(rl.getRegionInfo().getStartKey());
if (LOG.isDebugEnabled()) {
  LOG.debug("Removed " +
rl.getRegionInfo().getRegionNameAsString() +
" for tableName=" + Bytes.toString(tableName) +
" from cache " + "because of " + Bytes.toStringBinary(row));
}
  }
}
  }
}


-邮件原件-
发件人: Ted Yu (JIRA) [mailto:j...@apache.org] 
发送时间: 2011年5月12日 22:44
收件人: Gaojinchao
主题: [jira] [Updated] (HBASE-3878) Hbase client throws NoSuchElementException


 [ 
https://issues.apache.org/jira/browse/HBASE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3878:
--

Attachment: 3878.patch

Added try/catch block to guard against NoSuchElementException.

We already check possibleRegion not being null.

> Hbase client throws NoSuchElementException
> --
>
> Key: HBASE-3878
> URL: https://issues.apache.org/jira/browse/HBASE-3878
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.2
>Reporter: gaojinchao
> Fix For: 0.90.4
>
> Attachments: 3878.patch
>
>
> Soft reference objects, which are cleared at the discretion of the 
> garbage collector in response to memory demand. 
> I used ycsb to put data and threw exception.
> >>>> 
> >>>>  Hbase Code:
> >>>> // Cut the cache so that we only get the part that could contain
> >>>> // regions that match our key
> >>>> SoftValueSortedMap matchingRegions =
> >>>>   tableLocations.headMap(row);
> >>>> 
> >>>> // if that portion of the map is empty, then we're done. otherwise,
> >>>> // we need to examine the cached location to verify that it is
> >>>> // a match by end key as well.
> >>>> if (!matchingRegions.isEmpty()) {
> >>>>   HRegionLocation possibleRegion =
> >>>> matchingRegions.get(matchingRegions.lastKey());
> >>>> 
> >>>>   ycsb client log:
> >>>> 
> >>>>   [java] begin StatusThread run
> >>>>[java] java.util.NoSuchElementException
> >>>>[java] at java.util.TreeMap.key(TreeMap.java:1206)
> >>>>[java] at
> >> java.util.TreeMap$NavigableSubMap.lastKey(TreeMap.java:1435)
> >>>>[java] at
> >> org.apache.hadoop.hbase.util.SoftValueSortedMap.lastKey(SoftValueSort
> >> edMap.java:131)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
> >> tation.getCachedLocation(HConnectionManager.java:841)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
> >> tation.locateRegionInMeta(HConnectionManager.java:664)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
> >> tation.locateRegion(HConnectionManager.java:590)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
> >> tation.processBatch(HConnectionManager.java:1114)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
> >> tation.processBatchOfPuts(HConnectionManager.java:1234)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:675)
> >>>>[java] at
> >> org.apache.hadoop.hbase.client.HTable.put(HTable.java:665)
> >>>>[java] at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source)
> >>>>[java] at com.yahoo.ycsb.db.HBaseClient.insert(Unknown Source)
> >>>>[java] at com.yahoo.ycsb.DBWrapper.insert(Unknown Source)
> >>>>[java] at com.yahoo.ycsb.workloads.MyWorkload.doInsert(Unknown
> >> Source)
> >>>>[java] at com.yahoo.ycsb.ClientThread.run(Unknown Source)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


a little bug :)

2011-05-12 Thread Gaojinchao
Hbase version: 0.90.2

" leastloaded=" + serversByLoad.lastKey().getLoad().getNumberOfRegions());

Should be:

" leastloaded=" + serversByLoad. firstKey ().getLoad().getNumberOfRegions());



if(serversByLoad.lastKey().getLoad().getNumberOfRegions() <= ceiling &&
   serversByLoad.firstKey().getLoad().getNumberOfRegions() >= floor) {
  // Skipped because no server outside (min,max) range
  LOG.info("Skipping load balancing.  servers=" + numServers + " " +
  "regions=" + numRegions + " average=" + average + " " +
  "mostloaded=" + 
serversByLoad.lastKey().getLoad().getNumberOfRegions() +
  " leastloaded=" + 
serversByLoad.lastKey().getLoad().getNumberOfRegions());
  return null;
}


A question about OutOfMemoryError

2011-05-10 Thread Gaojinchao
I have a question about region server OutOfMemoryError.

The write data flow control by base.regionserver.global.memstore.upperLimit.
If reach the upperLimit, hbase client will stop put data.
But It seems the replay hlog don't it.

If a region server crash, The other will crash again because of the cluster is 
all busy ?
It relates to parameter hbase.regionserver.executor.openregion.threads and 
memstore size in version 0.90.X
If threads number is higher , It has a big risk.

I remember my cluster had some machines crashed in 0.20.6. but I can't root out.
It seems this reason. I want to test it in 0.90.x.








re: A question about Unit Test

2011-05-10 Thread Gaojinchao
Thanks for your reply.

1/  I have other question: It needs clear something when one test case had 
failed ?
In my case. The test cases auto run at night.
I find some test case failed randomly and try to dig it.


Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.599 sec <<< 
FAILURE!
testMergeTool(org.apache.hadoop.hbase.util.TestMergeTool)  Time elapsed: 7.586 
sec  <<< FAILURE!
junit.framework.AssertionFailedError: 'merging regions 0 and 1' failed
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at 
org.apache.hadoop.hbase.util.TestMergeTool.mergeAndVerify(TestMergeTool.java:182)
at 
org.apache.hadoop.hbase.util.TestMergeTool.testMergeTool(TestMergeTool.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)

test3686a(org.apache.hadoop.hbase.client.TestScannerTimeout)  Time elapsed: 
1.233 sec  <<< ERROR!
org.apache.hadoop.hbase.TableNotFoundException: t
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:724)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:593)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:558)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:171)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:130)
at 
org.apache.hadoop.hbase.client.TestScannerTimeout.test3686a(TestScannerTimeout.java:150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)

2/ DNS Error test case:
TestClockSkewDetection
testBadOriginalRootLocation
testScanner
TestCatalogTracker

Error logs:
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.031 sec <<< 
FAILURE!
testClockSkewDetection(org.apache.hadoop.hbase.master.TestClockSkewDetection)  
Time elapsed: 0.02 sec  <<< ERROR!
java.lang.IllegalArgumentException: Could not resolve the DNS name of 
example.org:1234
at 
org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
at org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:66)
at 
org.apache.hadoop.hbase.master.TestClockSkewDetection.testClockSkewDetection(TestClockSkewDetection.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)




-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年5月10日 1:35
收件人: dev@hbase.apache.org
主题: Re: A question about Unit Test

Did you try it Gao?  The below looks like it might be a little
'dangerous' in that an empty address happens when the hosted
InetSocketAddress fails to resolve.  Which test is failing?  We should
for sure make it so tests pass if you are not connected to the net.

Thanks Gao,
St.Ack

On Sun, May 8, 2011 at 12:40 AM, Gaojinchao  wrote:
>   private static final HServerAddress HSA =
>    new HServerAddress("example.org:1234");
>
> In my machine , It always failed because of DNS error.
> The example.org is hard code.  If we can add a function to get hostname.
> It likes:
>
>  public static String gethostname()
>  {
>        String h

Re: Table can't disable

2011-05-08 Thread Gaojinchao
.,
 server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED)
2011-05-08 17:52:47,388 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSED, server=C4C4.site,60020,1304820199467, 
region=4418fb197685a21f77e151e401cf8b66

// region server had closed region, but the region state had cleared. So it 
printed warning log.

2011-05-08 17:52:47,388 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received CLOSED for region 4418fb197685a21f77e151e401cf8b66 from server 
C4C4.site,60020,1304820199467 but region was in  the state null and not in 
expected PENDING_CLOSE or CLOSING states
2011-05-08 17:52:47,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Overwriting 4418fb197685a21f77e151e401cf8b66 on 
serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, 
usedHeap=4097, maxHeap=8175)

// The region state was set PENDING_CLOSE again.  the table couldn't disable 
and enable.
2011-05-08 17:52:47,398 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 (offlining)



-邮件原件-
发件人: Gaojinchao 
发送时间: 2011年5月8日 19:59
收件人: dev@hbase.apache.org
抄送: Chenjian
主题: Table can't disable

Today I test Hbase verison 0.90.3
I seems like some bugs
1, if node exists and node state is RS_ZK_REGION_CLOSED 
  We should call ClosedRegionHandle to close region. In that case the reigon 
has closed by region server 

case PENDING_CLOSE:
LOG.info("Region has been PENDING_CLOSE for too " +
"long, running forced unassign again on region=" +
regionInfo.getRegionNameAsString());
  try {
// If the server got the RPC, it will transition the node
// to CLOSING, so only do something here if no node exists
if (!ZKUtil.watchAndCheckExists(watcher,
  ZKAssign.getNodeName(watcher, 
regionInfo.getEncodedName( {
  // Queue running of an unassign -- do actual unassign
  // outside of the regionsInTransition lock.
  unassigns.add(regionInfo);
}
else
{
//It need to process for node 
state is RS_ZK_REGION_CLOSED 
}
  } catch (NoNodeException e) {
LOG.debug("Node no longer existed so not forcing another " +
  "unassignment");
  } catch (KeeperException e) {
LOG.warn("Unexpected ZK exception timing out a region " +
  "close", e);
  }
2, other It seems like zk message losing or there is some bugs for disabling 
table when region server is splitting region.
above is the logs. I need to keep digging.

Shell logs:
hbase(main):003:0*
hbase(main):004:0* disable 'ufdr'
11/05/08 18:20:40 DEBUG zookeeper.ZKUtil: hconnection-0x2fcd58283c0037 Set 
watcher on existing znode /hbase/root-region-server
11/05/08 18:20:40 DEBUG zookeeper.ZKUtil: hconnection-0x2fcd58283c0037 
Retrieved 15 byte(s) of data from znode /hbase/root-region-server and set 
watcher; C4C5.site:60020
11/05/08 18:53:19 DEBUG zookeeper.ZKUtil: hconnection-0x2fcd58283c0037 
Retrieved 9 byte(s) of data from znode /hbase/table/ufdr; data=DISABLING

ERROR: org.apache.hadoop.hbase.RegionException: Retries exhausted, it took too 
long to wait for the table ufdr to be disabled.


Region server logs:

2011-05-08 17:42:44,862 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Instantiated 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
2011-05-08 17:42:45,468 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
loaded 
hdfs://C4C1:9000/hbase/ufdr/4418fb197685a21f77e151e401cf8b66/value/3442771824694350714.8e9a3b05abe1c3a692999cf5e8dfd9dd,
 isReference=true, isBulkLoadResult=false, seqid=627830, majorCompaction=false
2011-05-08 17:42:45,471 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Onlined 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.;
 next sequenceid=627831
2011-05-08 17:42:45,471 DEBUG 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
for 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 because Region has references on open; priority=9, compaction queue size=40
2011-05-08 17:42:45,476 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 
daughter 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 in region .META.,,1, serverInfo=C4C4.site,60020,1304820199467
2011-05-08 17:42:45,476 INFO 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region s

Table can't disable

2011-05-08 Thread Gaojinchao
Today I test Hbase verison 0.90.3
I seems like some bugs
1, if node exists and node state is RS_ZK_REGION_CLOSED 
  We should call ClosedRegionHandle to close region. In that case the reigon 
has closed by region server 

case PENDING_CLOSE:
LOG.info("Region has been PENDING_CLOSE for too " +
"long, running forced unassign again on region=" +
regionInfo.getRegionNameAsString());
  try {
// If the server got the RPC, it will transition the node
// to CLOSING, so only do something here if no node exists
if (!ZKUtil.watchAndCheckExists(watcher,
  ZKAssign.getNodeName(watcher, 
regionInfo.getEncodedName( {
  // Queue running of an unassign -- do actual unassign
  // outside of the regionsInTransition lock.
  unassigns.add(regionInfo);
}
else
{
//It need to process for node 
state is RS_ZK_REGION_CLOSED 
}
  } catch (NoNodeException e) {
LOG.debug("Node no longer existed so not forcing another " +
  "unassignment");
  } catch (KeeperException e) {
LOG.warn("Unexpected ZK exception timing out a region " +
  "close", e);
  }
2, other It seems like zk message losing or there is some bugs for disabling 
table when region server is splitting region.
above is the logs. I need to keep digging.

Shell logs:
hbase(main):003:0*
hbase(main):004:0* disable 'ufdr'
11/05/08 18:20:40 DEBUG zookeeper.ZKUtil: hconnection-0x2fcd58283c0037 Set 
watcher on existing znode /hbase/root-region-server
11/05/08 18:20:40 DEBUG zookeeper.ZKUtil: hconnection-0x2fcd58283c0037 
Retrieved 15 byte(s) of data from znode /hbase/root-region-server and set 
watcher; C4C5.site:60020
11/05/08 18:53:19 DEBUG zookeeper.ZKUtil: hconnection-0x2fcd58283c0037 
Retrieved 9 byte(s) of data from znode /hbase/table/ufdr; data=DISABLING

ERROR: org.apache.hadoop.hbase.RegionException: Retries exhausted, it took too 
long to wait for the table ufdr to be disabled.


Region server logs:

2011-05-08 17:42:44,862 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Instantiated 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
2011-05-08 17:42:45,468 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
loaded 
hdfs://C4C1:9000/hbase/ufdr/4418fb197685a21f77e151e401cf8b66/value/3442771824694350714.8e9a3b05abe1c3a692999cf5e8dfd9dd,
 isReference=true, isBulkLoadResult=false, seqid=627830, majorCompaction=false
2011-05-08 17:42:45,471 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Onlined 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.;
 next sequenceid=627831
2011-05-08 17:42:45,471 DEBUG 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
for 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 because Region has references on open; priority=9, compaction queue size=40
2011-05-08 17:42:45,476 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added 
daughter 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 in region .META.,,1, serverInfo=C4C4.site,60020,1304820199467
2011-05-08 17:42:45,476 INFO 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META 
updated, and report to master. 
Parent=ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.,
 new regions: 
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66..
 Split took 0sec
2011-05-08 17:44:25,731 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
2011-05-08 17:48:11,066 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Starting compaction on region 
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
2011-05-08 17:48:11,067 INFO org.apache.hadoop.hbase.regionserver.Store: 
Started compaction of 1 file(s) in cf=value, hasReferences=true, into 
hdfs://C4C1:9000/hbase/ufdr/4418fb197685a21f77e151e401cf8b66/.tmp, 
seqid=627830, totalSize=892.2m
2011-05-08 17:48:11,067 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
Compacting 
hdfs://C4C1:9000/hbase/ufdr/4418fb197685a21f77e151e401cf8b66/value/3442771824694350714.8e9a3b05abe1c3a692999cf5e8dfd9dd-hdfs://C4C1:9000/hbase/ufdr/8e9a3b05abe1c3a692999cf5e8dfd9dd/value/3442771824694350714-top,
 keycount=2843998, bloomtype=NONE, size=892.2m
2011-05-08 17:48:32,340 INFO org.apache.hadoop.hbase.regionse

A question about Unit Test

2011-05-08 Thread Gaojinchao
   private static final HServerAddress HSA =
new HServerAddress("example.org:1234");

In my machine , It always failed because of DNS error.
The example.org is hard code.  If we can add a function to get hostname.
It likes:

  public static String gethostname()
  {
String hostName = null;
try{
hostName = java.net.InetAddress.getLocalHost().getHostName();
}
catch(java.net.UnknownHostException uhe)
{
..
}

return hostName;
  }


private static final HServerAddress HSA =
new HServerAddress(gethostname() +":1234");





Re: svn commit: r1100300 - in /hbase/branches/0.90: CHANGES.txt src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java

2011-05-07 Thread Gaojinchao
I have a silly question. 

It has modified the parameter hbase.zookeeper.property.maxClientCnxns 
It seems like that MAX_CACHED_HBASE_INSTANCES  needs modify.
It is hard code for hbase.  So who will create 5000 connections?
If hbase is , It may be some exception.




-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年5月7日 2:03
收件人: dev@hbase.apache.org
主题: Re: svn commit: r1100300 - in /hbase/branches/0.90: CHANGES.txt 
src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java 
src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 
src/test/resources/hbase-site.xml

Is there another constructor in there Andrew that was assigning data
members -- arrays and stuff?
St.Ack

On Fri, May 6, 2011 at 10:43 AM,   wrote:
> Author: apurtell
> Date: Fri May  6 17:43:06 2011
> New Revision: 1100300
>
> URL: http://svn.apache.org/viewvc?rev=1100300&view=rev
> Log:
> HBASE-3861 MiniZooKeeperCluster should refer to maxClientCnxns
>
> Modified:
>    hbase/branches/0.90/CHANGES.txt
>    
> hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
>    
> hbase/branches/0.90/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
>    hbase/branches/0.90/src/test/resources/hbase-site.xml
>
> Modified: hbase/branches/0.90/CHANGES.txt
> URL: 
> http://svn.apache.org/viewvc/hbase/branches/0.90/CHANGES.txt?rev=1100300&r1=1100299&r2=1100300&view=diff
> ==
> --- hbase/branches/0.90/CHANGES.txt (original)
> +++ hbase/branches/0.90/CHANGES.txt Fri May  6 17:43:06 2011
> @@ -41,6 +41,8 @@ Release 0.90.3 - Unreleased
>    HBASE-3821  "NOT flushing memstore for region" keep on printing for half
>                an hour (zhoushuaifeng)
>    HBASE-3848  request is always zero in WebUI for region server (gaojinchao)
> +   HBASE-3861  MiniZooKeeperCluster should refer to maxClientCnxns (Eugene
> +               Koontz via Andrew Purtell)
>
>   IMPROVEMENT
>    HBASE-3717  deprecate HTable isTableEnabled() methods in favor of 
> HBaseAdmin
>
> Modified: 
> hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
> URL: 
> http://svn.apache.org/viewvc/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java?rev=1100300&r1=1100299&r2=1100300&view=diff
> ==
> --- 
> hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
>  (original)
> +++ 
> hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
>  Fri May  6 17:43:06 2011
> @@ -31,7 +31,9 @@ import java.net.Socket;
>
>  import org.apache.commons.logging.Log;
>  import org.apache.commons.logging.LogFactory;
> +import org.apache.hadoop.conf.Configuration;
>  import org.apache.hadoop.fs.FileUtil;
> +import org.apache.hadoop.hbase.HBaseConfiguration;
>  import org.apache.zookeeper.server.NIOServerCnxn;
>  import org.apache.zookeeper.server.ZooKeeperServer;
>  import org.apache.zookeeper.server.persistence.FileTxnLog;
> @@ -53,9 +55,17 @@ public class MiniZooKeeperCluster {
>   private NIOServerCnxn.Factory standaloneServerFactory;
>   private int tickTime = 0;
>
> +  private Configuration configuration;
> +
>   /** Create mini ZooKeeper cluster. */
>   public MiniZooKeeperCluster() {
> +    this(HBaseConfiguration.create());
> +  }
> +
> +  /** Create mini ZooKeeper cluster with configuration (usually from test 
> environment) */
> +  public MiniZooKeeperCluster(Configuration configuration) {
>     this.started = false;
> +    this.configuration = configuration;
>   }
>
>   public void setClientPort(int clientPort) {
> @@ -105,8 +115,9 @@ public class MiniZooKeeperCluster {
>     ZooKeeperServer server = new ZooKeeperServer(dir, dir, tickTimeToUse);
>     while (true) {
>       try {
> +        int numberOfConnections = 
> this.configuration.getInt("hbase.zookeeper.property.maxClientCnxns",5000);
>         standaloneServerFactory =
> -          new NIOServerCnxn.Factory(new InetSocketAddress(clientPort));
> +         new NIOServerCnxn.Factory(new InetSocketAddress(clientPort), 
> numberOfConnections);
>       } catch (BindException e) {
>         LOG.info("Failed binding ZK Server to client port: " + clientPort);
>         //this port is already in use. try to use another
>
> Modified: 
> hbase/branches/0.90/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
> URL: 
> http://svn.apache.org/viewvc/hbase/branches/0.90/src/test/java/org/apache/hado

Re: svn commit: r1099971 - /hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/HServerAddress.java

2011-05-05 Thread Gaojinchao
Hi stack:
Do it also need apply 3744-addendum-for-TestAdmin.patch by Ted Yu? 

https://issues.apache.org/jira/browse/HBASE-3744

-邮件原件-
发件人: st...@apache.org [mailto:st...@apache.org] 
发送时间: 2011年5月6日 5:20
收件人: comm...@hbase.apache.org
主题: svn commit: r1099971 - 
/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/HServerAddress.java

Author: stack
Date: Thu May  5 21:20:08 2011
New Revision: 1099971

URL: http://svn.apache.org/viewvc?rev=1099971&view=rev
Log:
Fix up jenkins build failure in TestAdmin.testCreateTableWithRegions... address 
can be null when doing compareTo

Modified:

hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/HServerAddress.java

Modified: 
hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/HServerAddress.java
URL: 
http://svn.apache.org/viewvc/hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/HServerAddress.java?rev=1099971&r1=1099970&r2=1099971&view=diff
==
--- 
hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/HServerAddress.java 
(original)
+++ 
hbase/branches/0.90/src/main/java/org/apache/hadoop/hbase/HServerAddress.java 
Thu May  5 21:20:08 2011
@@ -187,6 +187,8 @@ public class HServerAddress implements W
 // Addresses as Strings may not compare though address is for the one
 // server with only difference being that one address has hostname
 // resolved whereas other only has IP.
+if (this.address == null) return -1;
+if (o.address == null) return 1;
 if (address.equals(o.address)) return 0;
 return toString().compareTo(o.toString());
   }




3744-addendum-for-TestAdmin.patch need apply to branch and trunk

2011-05-02 Thread Gaojinchao
Only run TestAdmin is success. But in my cluster run all test case, it always 
fails .
1.I think it need apply 
3744-addendum-for-TestAdmin.patch
 by Ted Yu.
The reason:
In version 0.90.2.  Region can assigned to Region server by this code
// 5. Trigger immediate assignment of the regions in round-robin fashion
   List servers = serverManager.getOnlineServersList();
   try {
 this.assignmentManager.assignUserRegions(Arrays.asList(newRegions), 
servers);// It waits for 10 minites.
   } catch (InterruptedException ie) {
 LOG.error("Caught " + ie + " during round-robin assignment");
 throw new IOException(ie);
   }

But In version 0.90.3. Region can't assigned to region server. (HBASE-3744 
introduced a change in how createTable() works by Ted Yu)

 // 5. Trigger immediate assignment of the regions in round-robin fashion
   List servers = serverManager.getOnlineServersList();
   this.assignmentManager.bulkAssignUserRegions(newRegions, servers, sync); 
   // It doesn't wait for.

So function verifyRoundRobinDistribution can't get address and throws exceptions
List regs = server2Regions.get(server);

public int hashCode() {
   int result = address.hashCode(); 
// Region can't be assigned, So it seems like throw some exceptions.
   result ^= stringValue.hashCode();
   return result;
 }


re: HConnectionManager max cached instances

2011-04-28 Thread Gaojinchao
In version 0.90.3, It has set for 2001

-邮件原件-
发件人: lohit [mailto:lohit.vijayar...@gmail.com] 
发送时间: 2011年4月29日 10:52
收件人: dev@hbase.apache.org
主题: Re: HConnectionManager max cached instances

Thanks for the pointer. Even if we increase
hbase.zookeeper.property.maxClientCnxns
wouldn't the cache size still be 31?
I was wondering if there was way to avoid cache miss so that it would not
create new ZK connection.

2011/4/28 Ted Yu 

> Increase value for the following parameter:
>  
>hbase.zookeeper.property.maxClientCnxns
>30
>
> Please read my blog for long-term fix which should be commited soon:
>
> http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.html
>
> On Thu, Apr 28, 2011 at 7:26 PM, lohit  wrote:
>
> > Hi,
> >
> > By looking at HConnectionManager it looks like from a single node a max
> of
> > 31 connections can be cached.
> >
> > >  static final int MAX_CACHED_HBASE_INSTANCES = 31;
> >
> > I read the comment that this is based on the assumption that max client
> > connections to zookeeper is 30 from a single node.
> > ZooKeeper has an option to change that value, but we seem to be doing lot
> > of
> > cache misses if we made more than 30 connections from a client.
> > This cache miss end up making new ZK connection and if number of HTable
> > operations are more, we see spike in ZooKeeper connections from single
> > node.
> >
> > This is seen in hbase.0.90.2. Is there any work around for this apart
> from
> > change the code?
> >
> > --
> > Thanks
> > Lohit
> >
>



-- 
Have a Nice Day!
Lohit


Re: It seems like a bug for test case//Re: A question about TestAdmin failed.

2011-04-26 Thread Gaojinchao
Thanks for your reply.
I  know HBASE-3744. I had merged to my version.

Only run TestAdmin is also success.
But run all test case, it failed .
It seems like the configure of machine.

I try to sleep 60s before call verifyRoundRobinDistribution.

发件人: Ted Yu [mailto:yuzhih...@gmail.com]
发送时间: 2011年4月27日 12:18
收件人: Gaojinchao
抄送: dev@hbase.apache.org; Chenjian
主题: Re: It seems like a bug for test case//Re: A question about TestAdmin 
failed.

HBASE-3744 introduced a change in how createTable() works.
By default, sync parameter is false:
  public void createTable(HTableDescriptor desc, byte [][] splitKeys)
  throws IOException {
createTable(desc, splitKeys, false);
  }
because HBaseAdmin.createTableAsync() doesn't pass sync parameter to the master.

On a Linux machine, TestAdmin passed.
2011/4/26 Gaojinchao mailto:gaojinc...@huawei.com>>
It seems like that test case TestAdmin has some bug.

In version 0.90.2.  Region can assigned to Region server by this code
// 5. Trigger immediate assignment of the regions in round-robin fashion
   List servers = serverManager.getOnlineServersList();
   try {
 this.assignmentManager.assignUserRegions(Arrays.asList(newRegions), 
servers);// It waits for 10 minites.
   } catch (InterruptedException ie) {
 LOG.error("Caught " + ie + " during round-robin assignment");
 throw new IOException(ie);
   }

But In version 0.90.3. Region can't assigned to region server. (issue 
HBASE-3744)

 // 5. Trigger immediate assignment of the regions in round-robin fashion
   List servers = serverManager.getOnlineServersList();
   this.assignmentManager.bulkAssignUserRegions(newRegions, servers, sync); 
   // It doesn't wait for.

So function verifyRoundRobinDistribution can't get address and throws exceptions
List regs = server2Regions.get(server);

public int hashCode() {
   int result = address.hashCode();// Region can't be assigned, So it 
seems like throw some exceptions.
   result ^= stringValue.hashCode();
   return result;
 }



-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>]
发送时间: 2011年4月26日 21:36
收件人: u...@hbase.apache.org<mailto:u...@hbase.apache.org>
主题: Re: A question about TestAdmin failed.

Stack made some change in trunk to deal with NPE.

FYI

On Tue, Apr 26, 2011 at 5:27 AM, Gaojinchao 
mailto:gaojinc...@huawei.com>> wrote:

> I merge some code to 0.90.2
> run unit test and find one failed.
>
> how to dig it ? thanks.
> Logs:
>
> ---
> Test set: org.apache.hadoop.hbase.client.TestAdmin
>
> ---
> Tests run: 16, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 649.314
> sec <<< FAILURE!
> testCreateTableWithRegions(org.apache.hadoop.hbase.client.TestAdmin)  Time
> elapsed: 3.475 sec  <<< ERROR!
> java.lang.NullPointerException
>at
> org.apache.hadoop.hbase.HServerAddress.hashCode(HServerAddress.java:149)
>at java.util.HashMap.get(HashMap.java:300)
>at
> org.apache.hadoop.hbase.client.TestAdmin.verifyRoundRobinDistribution(TestAdmin.java:309)
>at
> org.apache.hadoop.hbase.client.TestAdmin.testCreateTableWithRegions(TestAdmin.java:385)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>at
&

It seems like a bug for test case//Re: A question about TestAdmin failed.

2011-04-26 Thread Gaojinchao
It seems like that test case TestAdmin has some bug.

In version 0.90.2.  Region can assigned to Region server by this code
// 5. Trigger immediate assignment of the regions in round-robin fashion
List servers = serverManager.getOnlineServersList();
try {
  this.assignmentManager.assignUserRegions(Arrays.asList(newRegions), 
servers);// It waits for 10 minites.
} catch (InterruptedException ie) {
  LOG.error("Caught " + ie + " during round-robin assignment");
  throw new IOException(ie);
}

But In version 0.90.3. Region can't assigned to region server. (issue 
HBASE-3744)

  // 5. Trigger immediate assignment of the regions in round-robin fashion
List servers = serverManager.getOnlineServersList();
this.assignmentManager.bulkAssignUserRegions(newRegions, servers, sync);
// It doesn't wait for.

So function verifyRoundRobinDistribution can't get address and throws exceptions
List regs = server2Regions.get(server);

public int hashCode() {
int result = address.hashCode();// Region can't be assigned, So it 
seems like throw some exceptions.
result ^= stringValue.hashCode();
return result;
  }



-邮件原件-
发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
发送时间: 2011年4月26日 21:36
收件人: u...@hbase.apache.org
主题: Re: A question about TestAdmin failed.

Stack made some change in trunk to deal with NPE.

FYI

On Tue, Apr 26, 2011 at 5:27 AM, Gaojinchao  wrote:

> I merge some code to 0.90.2
> run unit test and find one failed.
>
> how to dig it ? thanks.
> Logs:
>
> ---
> Test set: org.apache.hadoop.hbase.client.TestAdmin
>
> ---
> Tests run: 16, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 649.314
> sec <<< FAILURE!
> testCreateTableWithRegions(org.apache.hadoop.hbase.client.TestAdmin)  Time
> elapsed: 3.475 sec  <<< ERROR!
> java.lang.NullPointerException
>at
> org.apache.hadoop.hbase.HServerAddress.hashCode(HServerAddress.java:149)
>at java.util.HashMap.get(HashMap.java:300)
>at
> org.apache.hadoop.hbase.client.TestAdmin.verifyRoundRobinDistribution(TestAdmin.java:309)
>at
> org.apache.hadoop.hbase.client.TestAdmin.testCreateTableWithRegions(TestAdmin.java:385)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
>at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
>at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
>at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
>at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
>at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
>at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>at
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:62)
>at
> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:140)
>at
> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:165)
>at org.apache.maven.surefire.Surefire.run(Surefire.java:107)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:289)
>at
> org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1005)
>
>