date:20130418

[
https://issues.apache.org/jira/browse/HBASE-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634874#comment-13634874
]

Nicolas Liochon commented on HBASE-8319:

bq. Can this be moved into a script in dev-tools? IMHO, the fewer moving parts
in jenkins, the better.

It's adding a level of indirection, so it makes things more complex and more
fragile. See for example the jenkins configuration that sayd 1.7, while the
script was pretending to use openjdk 1.6 before finally running with oracle
1.6.

bq. Who controls the jenkins config screen?
It's an access right per project. I have it, Ted has it, and you could probably
have it if you ask to Stack (if you don't have it already which is possible as
well).

bq. I wonder how it didn't happen before. Must be something in our pom that
changed.
It's unlikely, as it's hardcoded in maven (that's the advantage of maven). More
probably, it was changed a while ago, and as we were failing 90% of time if not
more in the hbase-server part, so no one saw the impact from hbase-it.

hbase-it tests are run when you ask to run all tests -- it should be that you
have to ask explicitly to run them

Key: HBASE-8319
URL: https://issues.apache.org/jira/browse/HBASE-8319
Project: HBase
Issue Type: Task
Reporter: stack
Assignee: Sergey Shelukhin
Priority: Critical
Fix For: 0.95.1

Up on trunk and on 0.95 apache builds, Sergey noticed that hbase-it tests are
running. Up to this, the convention was that you had to explicitly ask that
they run but that changed somehow recently. These tests are heavyweight,
take a long time to complete, and are very likely to fail up on the apache
infra (which is what we want of them but not as part of the general proofing
build).
For now the tests have been artificially disabled up on builds.apache.org but
their inclusion likely frustrates joe blow trying to do a local hbase
packaging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8362) Possible MultiGet optimization


[ 
https://issues.apache.org/jira/browse/HBASE-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634877#comment-13634877
 ] 

Nicolas Liochon commented on HBASE-8362:


bq.  It is not really that common to provide a filter for Gets and even 
timeranges are not used that often, so we could just only do this for Get with 
either of those.
We could change the API to support a single timerange and filter for a 
multiget. I bet it would cover 99% of the use cases.

 Possible MultiGet optimization
 --

 Key: HBASE-8362
 URL: https://issues.apache.org/jira/browse/HBASE-8362
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Currently MultiGets are executed on a RegionServer in a single thread in a 
 loop that handles each Get separately (opening a scanner, seeking, etc).
 It seems we could optimize this (per region at least) by opening a single 
 scanner and issue a reseek for each Get that was requested.
 I have not tested this yet and no patch, but I would like to solicit feedback 
 on this idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7239) Verify protobuf serialization is correctly chunking upon read to avoid direct memory OOMs


 [ 
https://issues.apache.org/jira/browse/HBASE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-7239:
---

Attachment: 7239-1.patch

Patch that mimics what was done in the 0.94 codebase for Result class.

 Verify protobuf serialization is correctly chunking upon read to avoid direct 
 memory OOMs
 -

 Key: HBASE-7239
 URL: https://issues.apache.org/jira/browse/HBASE-7239
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Critical
 Fix For: 0.95.1

 Attachments: 7239-1.patch


 Result.readFields() used to read from the input stream in 8k chunks to avoid 
 OOM issues with direct memory.
 (Reading variable sized chunks into direct memory prevent the JVM from 
 reusing the allocated direct memory and direct memory is only collected 
 during full GCs)
 This is just to verify protobufs parseFrom type methods do the right thing as 
 well so that we do not reintroduce this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8362) Possible MultiGet optimization

2013-04-18 Thread Varun Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634884#comment-13634884
 ] 

Varun Sharma commented on HBASE-8362:
-

Or add a new API and retain the older API for the exotic 1 % ?

 Possible MultiGet optimization
 --

 Key: HBASE-8362
 URL: https://issues.apache.org/jira/browse/HBASE-8362
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Currently MultiGets are executed on a RegionServer in a single thread in a 
 loop that handles each Get separately (opening a scanner, seeking, etc).
 It seems we could optimize this (per region at least) by opening a single 
 scanner and issue a reseek for each Get that was requested.
 I have not tested this yet and no patch, but I would like to solicit feedback 
 on this idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-7239) Verify protobuf serialization is correctly chunking upon read to avoid direct memory OOMs


[ 
https://issues.apache.org/jira/browse/HBASE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634883#comment-13634883
 ] 

Devaraj Das edited comment on HBASE-7239 at 4/18/13 6:19 AM:
-

Patch that mimics what was done in the 0.94 codebase for Result class. [~stack] 
 [~lhofhansl], please have a look.

  was (Author: devaraj):
Patch that mimics what was done in the 0.94 codebase for Result class.
  
 Verify protobuf serialization is correctly chunking upon read to avoid direct 
 memory OOMs
 -

 Key: HBASE-7239
 URL: https://issues.apache.org/jira/browse/HBASE-7239
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Critical
 Fix For: 0.95.1

 Attachments: 7239-1.patch


 Result.readFields() used to read from the input stream in 8k chunks to avoid 
 OOM issues with direct memory.
 (Reading variable sized chunks into direct memory prevent the JVM from 
 reusing the allocated direct memory and direct memory is only collected 
 during full GCs)
 This is just to verify protobufs parseFrom type methods do the right thing as 
 well so that we do not reintroduce this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8214) Remove proxy and engine, rely directly on pb generated Service

2013-04-18 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-8214:
-

Attachment: 8214v2.txt

{code}
Removes engine and proxy. Everywhere we now use the pb Interface explicitly
with no
shim or decoration (the BlockingInterface to be more specific)

Compiles. Most tests pass. Need to fix the remainder.

Pluses:
+ Regularizes our rpc. No more voodoo.
+ Removes at least two layers. We could remove more if we go mess with protoc
generation (as per Gary as per pb recommendation)

Cons:
+ Looks a bit ugly. It is ugly the way we now do security information.
Previous, the kerberos and token info was an interface
that the rpc interface implemented but now, because pb does the server and stub
implementation creation and they cannot be
altered -- not w/o protoc messing, we have to pass the rpc interface and its
security info separately (you cannot take the
BlockingService or BlockingStub class and get the token and kerberos Interfaces
from them, not w/o a bunch of ugly
delegations).

Adds a class per rpc interface named *SecurityInfo. For example,
ClientServiceSecurityInfo and AdminServiceSecurityInfo.
Only purpose is the carrying of the Kerberos and Token info. Gets passed when
we setup an rpcserver and when we make
an rpcclient stub.

Removes the now useless IpcProtocol and ditto for interfaces that extended the
pb blockinginterfaces such as
MasterAdminProtocol. Instead, we now pass raw BlockingInterface and the
kerberos and token interfaces are
moved to the new *SecurityInfo classes.

Bulk of changes are using BlockingInterface instead and class removals such as
HBaseClientRPC and the support
for caching of method invocations. For example, changing AdminProto:col to
instead refer
to AdminService.BlockingInterface (If you looked at old AdminProtocol, it
implemented BlockingInterface)

The new rpc classes are name RpcClient and RpcServerImplementation (there is a
silly RpcServer Interface
that is in the way and needs at least some cleanup and probably just removal
but can do that later)
These classes have facility to help make the protobuf stub on the client side.

Got rid of MasterService that only had isMasterRunning in it and added this
method to MasterMonitor and
to MasterAdmin -- they both have it now; as said above we were trying to do
some kinda inheritance where
both MasterMonitor and MasterAdmin both had isMasterRunning method. Dodgy.

TODO:

+ See if I can make this cleaner still. Would appreciate suggestion on the
*SecurityInfo stuff.
+ Fix outstanding tests.
+ There is a bit of a mess still in HCOnnectionManager around isMasterRunning.
It was working
because we had faked pb service inheritance, something it does not support and
something that
the pb fellows are against in principal. Did some fixup but it is typeless.
Need to spend more
time on it.
{code}

Remove proxy and engine, rely directly on pb generated Service
--

Key: HBASE-8214
URL: https://issues.apache.org/jira/browse/HBASE-8214
Project: HBase
Issue Type: Bug
Components: IPC/RPC
Reporter: stack
Assignee: stack
Attachments: 8124.txt, 8214v2.txt

Attached patch is not done. Removes two to three layers -- depending on how
you count -- between client and rpc client and similar on server side
(between rpc and server implementation). Strips ProtobufRpcServer/Client and
HBaseClientRpc/HBaseServerRpc. Also gets rid of proxy.

[jira] [Commented] (HBASE-8362) Possible MultiGet optimization


[ 
https://issues.apache.org/jira/browse/HBASE-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634930#comment-13634930
 ] 

Nicolas Liochon commented on HBASE-8362:


bq. Or add a new API and retain the older API for the exotic 1 % ?
+1 (that's what I wanted to say actually)

 Possible MultiGet optimization
 --

 Key: HBASE-8362
 URL: https://issues.apache.org/jira/browse/HBASE-8362
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 Currently MultiGets are executed on a RegionServer in a single thread in a 
 loop that handles each Get separately (opening a scanner, seeking, etc).
 It seems we could optimize this (per region at least) by opening a single 
 scanner and issue a reseek for each Get that was requested.
 I have not tested this yet and no patch, but I would like to solicit feedback 
 on this idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8366) HBaseServer logs the full query.


[ 
https://issues.apache.org/jira/browse/HBASE-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634936#comment-13634936
 ] 

Nicolas Liochon commented on HBASE-8366:


After some thinking, I think the best option is to remove both of them (mine + 
the one mentioned by Himanshu). That's the only viable long term option. I will 
do this if nobody objects.

 HBaseServer logs the full query.
 

 Key: HBASE-8366
 URL: https://issues.apache.org/jira/browse/HBASE-8366
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8366.v1.patch


 We log the query when we have an error. As a results, the logs are not 
 readable when using stuff like multi.
 As a side note, this is as well a security issue (no need to encrypt the 
 network and the storage if the logs contain everything). I'm not removing the 
 full log line here; but just ask and I do it :-).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8360) In HBaseClient#cancelConnections we should close fully the connection


[ 
https://issues.apache.org/jira/browse/HBASE-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634938#comment-13634938
 ] 

Nicolas Liochon commented on HBASE-8360:


unrelated error. I will commit tomorrow my time if nobody objects.

 In HBaseClient#cancelConnections we should close fully the connection
 -

 Key: HBASE-8360
 URL: https://issues.apache.org/jira/browse/HBASE-8360
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8860.v1.patch


 If not the clients are not disconnected hence still depend on the timeout...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8214) Remove proxy and engine, rely directly on pb generated Service

2013-04-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634940#comment-13634940
 ] 

stack commented on HBASE-8214:
--

I updated rb https://reviews.apache.org/r/10174/

 Remove proxy and engine, rely directly on pb generated Service
 --

 Key: HBASE-8214
 URL: https://issues.apache.org/jira/browse/HBASE-8214
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Reporter: stack
Assignee: stack
 Attachments: 8124.txt, 8214v2.txt


 Attached patch is not done.  Removes two to three layers -- depending on how 
 you count -- between client and rpc client and similar on server side 
 (between rpc and server implementation).  Strips ProtobufRpcServer/Client and 
 HBaseClientRpc/HBaseServerRpc.  Also gets rid of proxy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8359) Too much logs on HConnectionManager


[ 
https://issues.apache.org/jira/browse/HBASE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634941#comment-13634941
 ] 

Nicolas Liochon commented on HBASE-8359:


Commited to .95  trunk. Thanks for the review, Andrew and Sergey!

 Too much logs on HConnectionManager
 ---

 Key: HBASE-8359
 URL: https://issues.apache.org/jira/browse/HBASE-8359
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Attachments: 8359.v1.patch


 Under a YCSB load test (for HBASE-6295), we can have sporadic bulk of logs 
 because of this:
 {code}
   final RegionMovedException rme = RegionMovedException.find(exception);
   if (rme != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  moved to 
  +
   rme.getHostname() + : + rme.getPort() +  according to  + 
 source.getHostnamePort());
 updateCachedLocation(
 regionInfo, source, rme.getServerName(), rme.getLocationSeqNum());
   } else if (RegionOpeningException.find(exception) != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  is being 
 opened on 
   + source.getHostnamePort() + ; not deleting the cache entry);
   } else {
 deleteCachedLocation(regionInfo, source);
   }
 {code}
 They should just be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8359) Too much logs on HConnectionManager


 [ 
https://issues.apache.org/jira/browse/HBASE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-8359:
---

   Resolution: Fixed
Fix Version/s: 0.95.1
   0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Too much logs on HConnectionManager
 ---

 Key: HBASE-8359
 URL: https://issues.apache.org/jira/browse/HBASE-8359
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8359.v1.patch


 Under a YCSB load test (for HBASE-6295), we can have sporadic bulk of logs 
 because of this:
 {code}
   final RegionMovedException rme = RegionMovedException.find(exception);
   if (rme != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  moved to 
  +
   rme.getHostname() + : + rme.getPort() +  according to  + 
 source.getHostnamePort());
 updateCachedLocation(
 regionInfo, source, rme.getServerName(), rme.getLocationSeqNum());
   } else if (RegionOpeningException.find(exception) != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  is being 
 opened on 
   + source.getHostnamePort() + ; not deleting the cache entry);
   } else {
 deleteCachedLocation(regionInfo, source);
   }
 {code}
 They should just be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8366) HBaseServer logs the full query.

2013-04-18 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634954#comment-13634954
]

stack commented on HBASE-8366:
--

Sorry for this lads. I am going to come back and fix this (thanks for filing
the issue). TextFormat was useful debugging the ipc but yeah, too verbose. On
other hand, because we are all pb now, we can log a TextFormat shorthand and
print out stuff like region and row which will help when query is tooSlow or
tooBig. TextFormat is not subclassable so it would be a hbase form of
TextFormat. I can assign this to myself since I have a notion of how it should
be if that is ok w/ you lot.

HBaseServer logs the full query.

Key: HBASE-8366
URL: https://issues.apache.org/jira/browse/HBASE-8366
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Fix For: 0.98.0, 0.95.1

Attachments: 8366.v1.patch

We log the query when we have an error. As a results, the logs are not
readable when using stuff like multi.
As a side note, this is as well a security issue (no need to encrypt the
network and the storage if the logs contain everything). I'm not removing the
full log line here; but just ask and I do it :-).

[jira] [Commented] (HBASE-8366) HBaseServer logs the full query.


[ 
https://issues.apache.org/jira/browse/HBASE-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634967#comment-13634967
 ] 

Nicolas Liochon commented on HBASE-8366:


I'm fine if you do it :-).

 HBaseServer logs the full query.
 

 Key: HBASE-8366
 URL: https://issues.apache.org/jira/browse/HBASE-8366
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8366.v1.patch


 We log the query when we have an error. As a results, the logs are not 
 readable when using stuff like multi.
 As a side note, this is as well a security issue (no need to encrypt the 
 network and the storage if the logs contain everything). I'm not removing the 
 full log line here; but just ask and I do it :-).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

2013-04-18 Thread Matteo Bertozzi (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634972#comment-13634972
]

Matteo Bertozzi commented on HBASE-8369:

in general I'm against having another way to direct access the data, since it
means that you're giving up on optimizing the main one.

but if the final implementation will be like this one using the HRegion object,
I'll be +1.

MapReduce over snapshot files
-

Key: HBASE-8369
URL: https://issues.apache.org/jira/browse/HBASE-8369
Project: HBase
Issue Type: New Feature
Components: mapreduce, snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Fix For: 0.98.0, 0.95.2

Attachments: hbase-8369_v0.patch

The idea is to add an InputFormat, which can run the mapreduce job over
snapshot files directly bypassing hbase server layer. The IF is similar in
usage to TableInputFormat, taking a Scan object from the user, but instead of
running from an online table, it runs from a table snapshot. We do one split
per region in the snapshot, and open an HRegion inside the RecordReader. A
RegionScanner is used internally for doing the scan without any HRegionServer
bits.
Users have been asking and searching for ways to run MR jobs by reading
directly from hfiles, so this allows new use cases if reading from stale data
is ok:
- Take snapshots periodically, and run MR jobs only on snapshots.
- Export snapshots to remote hdfs cluster, run the MR jobs at that cluster
without HBase cluster.
- (Future use case) Combine snapshot data with online hbase data: Scan from
yesterday's snapshot, but read today's data from online hbase cluster.

[jira] [Commented] (HBASE-8365) Duplicated ZK notifications cause Master abort (or other unknown issues)


[ 
https://issues.apache.org/jira/browse/HBASE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635031#comment-13635031
 ] 

ramkrishna.s.vasudevan commented on HBASE-8365:
---

I went thro the logs once again. One thing that is surprising is How did two 
nodeDataChangedEvent occur for FAILED_OPEN.


Is it like when the znode got changed to OPENING twice and then to FAILED_OPEN 
for each change we got one event and by the time it tried to process the first 
two times the data it got was FAILED_OPEN but the third time it was OPENING due 
to some other latest assign opeartion?

Thanks.  


 Duplicated ZK notifications cause Master abort (or other unknown issues)
 

 Key: HBASE-8365
 URL: https://issues.apache.org/jira/browse/HBASE-8365
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.6
Reporter: Jeffrey Zhong
 Attachments: TestResult.txt


 The duplicated ZK notifications should happen in trunk as well. Since the way 
 we handle ZK notifications is different in trunk, we don't see the issue 
 there. I'll explain later.
 The issue is causing TestMetaReaderEditor.testRetrying flaky with error 
 message {code}reader: count=2, t=null{code} A related link is at 
 https://builds.apache.org/job/HBase-0.94/941/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/
 The test case failure is due to an IllegalStateException and master is 
 aborted so the rest test cases also failed after testRetrying.
 Below are steps why the issue is happening(region 
 fa0e7a5590feb69bd065fbc99c228b36 is in interests):
 1) Got first notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,197
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 In the step, AM tries to open the region on another RS in a separate thread
 2) Got second notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,200 
 {code}DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 3) Later got opening notification event result from the step 1 at 2013-04-04 
 17:39:01,288 
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_OPENING, 
 server=janus.apache.org,54833,1365097126175, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 Step 2 ClosedRegionHandler throws IllegalStateException because Cannot 
 transit it to OFFLINE(state is in opening from notification 3) and abort 
 Master. This could happen in 0.94 because we handle notifications using 
 executorService which opens the door to handle events out of order through 
 receive them in order of updates.
 I've confirmed that we don't have duplicated AM listeners and both events 
 triggered by same ZK data of exact same version. The issue can be reproduced 
 once by running testRetrying test case 20 times in a loop.
 There are several issues for the failure:
 1) duplicated ZK notifications. Since ZK watcher is one time trigger, the 
 duplicated notification should not happen from the same data of the same 
 version in the first place
 2) ZooKeeper watcher handling is wrong in both 0.94 and trunk as following:
 a) 0.94 handle notifications in async way which may lead to handle 
 notifications out of order of the events happened
 b) In trunk, we handle ZK notifications synchronously which slows down other 
 components such as SSH, LogSplitting etc. because we have a single 
 notification queue
 c) In trunk  0.94, we could use stale event data because we have a long 
 listener list. ZK node state could have changed at the time when handling the 
 event. If a listener needs to act upon latest state, it should re-fetch the 
 data to verify if the data triggered the handler hasn't changed.
 Suggestions:
 For 0.94, we can bandit the CloseRegionHandler to pass in the expected ZK 
 data version to skip event handling on stale data with min impact.
 For trunk, I'll open an improvement JIRA on ZK notification handling to 
 provide more parallelism to handle unrelated notifications.
 For the duplicated ZK notifications, we need bring some ZK experters to take 
 a look at this.
 Please let me know what you think or any better idea.
 Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8365) Duplicated ZK notifications cause Master abort (or other unknown issues)


[ 
https://issues.apache.org/jira/browse/HBASE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635041#comment-13635041
 ] 

ramkrishna.s.vasudevan commented on HBASE-8365:
---

If i recollect from what i have debugged is that the nodeDataChangeEvent only 
will give the latest data because it will not be able to read the old data.
I may be wrong here.

 Duplicated ZK notifications cause Master abort (or other unknown issues)
 

 Key: HBASE-8365
 URL: https://issues.apache.org/jira/browse/HBASE-8365
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.6
Reporter: Jeffrey Zhong
 Attachments: TestResult.txt


 The duplicated ZK notifications should happen in trunk as well. Since the way 
 we handle ZK notifications is different in trunk, we don't see the issue 
 there. I'll explain later.
 The issue is causing TestMetaReaderEditor.testRetrying flaky with error 
 message {code}reader: count=2, t=null{code} A related link is at 
 https://builds.apache.org/job/HBase-0.94/941/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/
 The test case failure is due to an IllegalStateException and master is 
 aborted so the rest test cases also failed after testRetrying.
 Below are steps why the issue is happening(region 
 fa0e7a5590feb69bd065fbc99c228b36 is in interests):
 1) Got first notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,197
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 In the step, AM tries to open the region on another RS in a separate thread
 2) Got second notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,200 
 {code}DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 3) Later got opening notification event result from the step 1 at 2013-04-04 
 17:39:01,288 
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_OPENING, 
 server=janus.apache.org,54833,1365097126175, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 Step 2 ClosedRegionHandler throws IllegalStateException because Cannot 
 transit it to OFFLINE(state is in opening from notification 3) and abort 
 Master. This could happen in 0.94 because we handle notifications using 
 executorService which opens the door to handle events out of order through 
 receive them in order of updates.
 I've confirmed that we don't have duplicated AM listeners and both events 
 triggered by same ZK data of exact same version. The issue can be reproduced 
 once by running testRetrying test case 20 times in a loop.
 There are several issues for the failure:
 1) duplicated ZK notifications. Since ZK watcher is one time trigger, the 
 duplicated notification should not happen from the same data of the same 
 version in the first place
 2) ZooKeeper watcher handling is wrong in both 0.94 and trunk as following:
 a) 0.94 handle notifications in async way which may lead to handle 
 notifications out of order of the events happened
 b) In trunk, we handle ZK notifications synchronously which slows down other 
 components such as SSH, LogSplitting etc. because we have a single 
 notification queue
 c) In trunk  0.94, we could use stale event data because we have a long 
 listener list. ZK node state could have changed at the time when handling the 
 event. If a listener needs to act upon latest state, it should re-fetch the 
 data to verify if the data triggered the handler hasn't changed.
 Suggestions:
 For 0.94, we can bandit the CloseRegionHandler to pass in the expected ZK 
 data version to skip event handling on stale data with min impact.
 For trunk, I'll open an improvement JIRA on ZK notification handling to 
 provide more parallelism to handle unrelated notifications.
 For the duplicated ZK notifications, we need bring some ZK experters to take 
 a look at this.
 Please let me know what you think or any better idea.
 Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8329) Limit compaction speed

2013-04-18 Thread binlijin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-8329:


Attachment: HBASE-8329-trunk.patch

 Limit compaction speed
 --

 Key: HBASE-8329
 URL: https://issues.apache.org/jira/browse/HBASE-8329
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: binlijin
 Attachments: HBASE-8329-trunk.patch


 There is no speed or resource limit for compaction，I think we should add this 
 feature especially when request burst.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8372) Provide mutability to CompoundConfiguration

Ted Yu created HBASE-8372:
-

 Summary: Provide mutability to CompoundConfiguration
 Key: HBASE-8372
 URL: https://issues.apache.org/jira/browse/HBASE-8372
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Yu


In discussion of HBASE-8347, it was proposed that CompoundConfiguration should 
support mutability.

This can be done by consolidating ImmutableConfigMap's on first modification to 
CompoundConfiguration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8359) Too much logs on HConnectionManager

2013-04-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635163#comment-13635163
 ] 

Hudson commented on HBASE-8359:
---

Integrated in hbase-0.95-on-hadoop2 #73 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/73/])
HBASE-8359  Too much logs on HConnectionManager (Revision 1469204)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


 Too much logs on HConnectionManager
 ---

 Key: HBASE-8359
 URL: https://issues.apache.org/jira/browse/HBASE-8359
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8359.v1.patch


 Under a YCSB load test (for HBASE-6295), we can have sporadic bulk of logs 
 because of this:
 {code}
   final RegionMovedException rme = RegionMovedException.find(exception);
   if (rme != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  moved to 
  +
   rme.getHostname() + : + rme.getPort() +  according to  + 
 source.getHostnamePort());
 updateCachedLocation(
 regionInfo, source, rme.getServerName(), rme.getLocationSeqNum());
   } else if (RegionOpeningException.find(exception) != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  is being 
 opened on 
   + source.getHostnamePort() + ; not deleting the cache entry);
   } else {
 deleteCachedLocation(regionInfo, source);
   }
 {code}
 They should just be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

[
https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635167#comment-13635167
]

Jean-Marc Spaggiari commented on HBASE-8369:

I very like the idea.

Are the initCredentials modifications in TableMapReduceUtil required for the
scope of this patch? Or they are coming from another scope?

MapReduce over snapshot files
-

Attachments: hbase-8369_v0.patch

[jira] [Commented] (HBASE-8357) current region server failover mechanism for replication can lead to stale region server whose left hlogs can't be replicated by other region server


[ 
https://issues.apache.org/jira/browse/HBASE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635201#comment-13635201
 ] 

Himanshu Vashishtha commented on HBASE-8357:


[~fenghh] Also, consider using 0.94.6 when using HBase-2611. Please refer to 
the release notes on HBase-2611.

 current region server failover mechanism for replication can lead to stale 
 region server whose left hlogs can't be replicated by other region server
 

 Key: HBASE-8357
 URL: https://issues.apache.org/jira/browse/HBASE-8357
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.3
Reporter: Feng Honghua

 consider this scenario: rs A/B/C, A dies, B and C race to lock A to help 
 replicate A's left unreplicated hlogs, B wins and successfully creates lock 
 under A's znode, but before B copies A's hlog queues to its own znode, B also 
 dies, and C successfully creates lock under B's znode and helps replicate B's 
 own left hlogs. But A's left hlogs can't be replicated by any other rs since 
 B left back a lock under A's znode and B didn't transfer A's hlog queues to 
 its own znode before B dies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2611) Handle RS that fails while processing the failure of another one


 [ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-2611:
---

Release Note: The fix for this issue uses Zookeeper multi functionality 
(hbase.zookeeper.useMulti). Please refer to hbase-default.xml about this 
property. There is an addendum fix at HBase-8099 (fixed in 0.94.6). In case you 
are running on branch  0.94.6, please patch it with HBase-8099, OR make sure 
hbase.zookeeper.useMulti is set to false.

 Handle RS that fails while processing the failure of another one
 

 Key: HBASE-2611
 URL: https://issues.apache.org/jira/browse/HBASE-2611
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Jean-Daniel Cryans
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.94.5, 0.95.0

 Attachments: 2611-0.94.txt, 2611-trunk-v3.patch, 2611-trunk-v4.patch, 
 2611-v3.patch, HBASE-2611-trunk-v2.patch, HBASE-2611-trunk-v3.patch, 
 HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch


 HBASE-2223 doesn't manage region servers that fail while doing the transfer 
 of HLogs queues from other region servers that failed. Devise a reliable way 
 to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8373) Update Rolling Restart documentation

Jean-Marc Spaggiari created HBASE-8373:
--

 Summary: Update Rolling Restart documentation
 Key: HBASE-8373
 URL: https://issues.apache.org/jira/browse/HBASE-8373
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor


Rolling Restart documentation specifies that we need to stop the balancer 
before proceeding. However, bin/graceful_stop.sh is already taking care of that:
{code}
echo Disabling balancer!
echo 'balance_switch false' | $bin/hbase --config ${HBASE_CONF_DIR} shell
{code}
So documentation need to be updated to remove this recommandation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8373) Update Rolling Restart documentation


 [ 
https://issues.apache.org/jira/browse/HBASE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8373:
---

Attachment: HBASE-8373-v0-trunk.patch

 Update Rolling Restart documentation
 

 Key: HBASE-8373
 URL: https://issues.apache.org/jira/browse/HBASE-8373
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-8373-v0-trunk.patch


 Rolling Restart documentation specifies that we need to stop the balancer 
 before proceeding. However, bin/graceful_stop.sh is already taking care of 
 that:
 {code}
 echo Disabling balancer!
 echo 'balance_switch false' | $bin/hbase --config ${HBASE_CONF_DIR} shell
 {code}
 So documentation need to be updated to remove this recommandation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8373) Update Rolling Restart documentation


 [ 
https://issues.apache.org/jira/browse/HBASE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8373:
---

Status: Patch Available  (was: Open)

Documentation update attached.

 Update Rolling Restart documentation
 

 Key: HBASE-8373
 URL: https://issues.apache.org/jira/browse/HBASE-8373
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-8373-v0-trunk.patch


 Rolling Restart documentation specifies that we need to stop the balancer 
 before proceeding. However, bin/graceful_stop.sh is already taking care of 
 that:
 {code}
 echo Disabling balancer!
 echo 'balance_switch false' | $bin/hbase --config ${HBASE_CONF_DIR} shell
 {code}
 So documentation need to be updated to remove this recommandation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8373) Update Rolling Restart documentation


[ 
https://issues.apache.org/jira/browse/HBASE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635224#comment-13635224
 ] 

Hadoop QA commented on HBASE-8373:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12579330/HBASE-8373-v0-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5343//console

This message is automatically generated.

 Update Rolling Restart documentation
 

 Key: HBASE-8373
 URL: https://issues.apache.org/jira/browse/HBASE-8373
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-8373-v0-trunk.patch


 Rolling Restart documentation specifies that we need to stop the balancer 
 before proceeding. However, bin/graceful_stop.sh is already taking care of 
 that:
 {code}
 echo Disabling balancer!
 echo 'balance_switch false' | $bin/hbase --config ${HBASE_CONF_DIR} shell
 {code}
 So documentation need to be updated to remove this recommandation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635225#comment-13635225
 ] 

Nicolas Liochon commented on HBASE-6295:


Load test with YCSB on EC2. Lot of problems. The server seems sensible to the 
workload, and beeing asynchronous adds some workload.
Here is a stack with a moderate setting. I don't get the UnknownHostException: 
ip-10-4-226-168, may be there are much calls for AWS...

{noformat}
2013-04-18 10:41:33,377 INFO  
[regionserver60020-smallCompactions-1366296026287] 
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom f
2013-04-18 10:41:37,849 FATAL [regionserver60020.logRoller] 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
ip-10-4-229-217
java.io.IOException: cannot get log writer
at 
org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:162)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.createWriterInstance(FSHLog.java:591)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:533)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:96)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:169)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:159)
... 4 more
Caused by: java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:123)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:149)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:234)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:453)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:150)
... 5 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:121)
... 20 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
ip-10-4-226-168
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:417)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:415)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:382)
at org.apache.hadoop.fs.Hdfs.init(Hdfs.java:85)
... 25 more
Caused by: java.net.UnknownHostException: ip-10-4-226-168
... 31 more
2013-04-18 10:41:37,851 FATAL [regionserver60020.logRoller] 
org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded 
coprocessor
2013-04-18 10:41:37,863 INFO  [regionserver60020.logRoller] 
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: IOE in log roller
{noformat}

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance

[jira] [Created] (HBASE-8374) NPE when launching the balance

Nicolas Liochon created HBASE-8374:
--

 Summary: NPE when launching the balance
 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon


I don't reproduce this all the time, but I had it on a fairly clean env.
It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
balancer does not run. When it starts to occurs, it occurs all the time. I 
haven't tried to restart the master, but I think it should be enough.
Now, looking at the code, the NPE is strange. 

{noformat}
2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
at 
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
at 
org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
at java.lang.Thread.run(Thread.java:662)
2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
starting at key ''
{noformat}

{code}
  if (regionFinder != null) {
//region location
ListServerName loc = regionFinder.getTopBlockLocations(region);
regionLocations[regionIndex] = new int[loc.size()];
for (int i=0; i  loc.size(); i++) {
  regionLocations[regionIndex][i] = serversToIndex.get(loc.get(i)); 
 // = NPE here
}
  }
{code}


pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635227#comment-13635227
 ] 

Nicolas Liochon commented on HBASE-6295:


On the good news part, it seems it does what we expect: performances are 25% 
better, even with a dead regionserver.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8214) Remove proxy and engine, rely directly on pb generated Service

2013-04-18 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635248#comment-13635248
 ] 

Nick Dimiduk commented on HBASE-8214:
-

bq. It is ugly the way we now do security information.

Dunno if it's interesting, but you might have a look at 
https://code.google.com/p/protobuf-rpc-pro/wiki/SSLGuide and the relevant 
implementation.

 Remove proxy and engine, rely directly on pb generated Service
 --

 Key: HBASE-8214
 URL: https://issues.apache.org/jira/browse/HBASE-8214
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Reporter: stack
Assignee: stack
 Attachments: 8124.txt, 8214v2.txt


 Attached patch is not done.  Removes two to three layers -- depending on how 
 you count -- between client and rpc client and similar on server side 
 (between rpc and server implementation).  Strips ProtobufRpcServer/Client and 
 HBaseClientRpc/HBaseServerRpc.  Also gets rid of proxy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8373) Update Rolling Restart documentation


[ 
https://issues.apache.org/jira/browse/HBASE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635253#comment-13635253
 ] 

Ted Yu commented on HBASE-8373:
---

Can you regenerate the patch ?
The correct path should be src/main/docbkx/ops_mgt.xml

 Update Rolling Restart documentation
 

 Key: HBASE-8373
 URL: https://issues.apache.org/jira/browse/HBASE-8373
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-8373-v0-trunk.patch


 Rolling Restart documentation specifies that we need to stop the balancer 
 before proceeding. However, bin/graceful_stop.sh is already taking care of 
 that:
 {code}
 echo Disabling balancer!
 echo 'balance_switch false' | $bin/hbase --config ${HBASE_CONF_DIR} shell
 {code}
 So documentation need to be updated to remove this recommandation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8359) Too much logs on HConnectionManager

2013-04-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635259#comment-13635259
 ] 

Hudson commented on HBASE-8359:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #503 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/503/])
HBASE-8359  Too much logs on HConnectionManager (Revision 1469203)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


 Too much logs on HConnectionManager
 ---

 Key: HBASE-8359
 URL: https://issues.apache.org/jira/browse/HBASE-8359
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8359.v1.patch


 Under a YCSB load test (for HBASE-6295), we can have sporadic bulk of logs 
 because of this:
 {code}
   final RegionMovedException rme = RegionMovedException.find(exception);
   if (rme != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  moved to 
  +
   rme.getHostname() + : + rme.getPort() +  according to  + 
 source.getHostnamePort());
 updateCachedLocation(
 regionInfo, source, rme.getServerName(), rme.getLocationSeqNum());
   } else if (RegionOpeningException.find(exception) != null) {
 LOG.info(Region  + regionInfo.getRegionNameAsString() +  is being 
 opened on 
   + source.getHostnamePort() + ; not deleting the cache entry);
   } else {
 deleteCachedLocation(regionInfo, source);
   }
 {code}
 They should just be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Attachment: 8374-trunk.txt

I think loc was empty, leading to NPE.

Here is a patch.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
 Attachments: 8374-trunk.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-8374:
-

Assignee: Ted Yu

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Attachments: 8374-trunk.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Fix Version/s: 0.95.1
   0.98.0
   Status: Patch Available  (was: Open)

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635271#comment-13635271
 ] 

Jean-Marc Spaggiari commented on HBASE-8374:


Hi Ted,

I'm not sure about your patch.
{code}
+if (loc.size()  0) {
+  regionLocations[regionIndex] = new int[loc.size()];
+  for (int i=0; i  loc.size(); i++) {
+regionLocations[regionIndex][i] = 
serversToIndex.get(loc.get(i));
+  }
 }
{code}

If loc.size() == 0, then the for loop will never run and the loc.get(i) will 
never be called. No? And we know that loc can't be null else the NPE will have 
been on loc.size(). So 2 options.
1) log.get(i) returns null and serversToIndex.get(null) give the NPE 
2) serversToIndex is null.

I think we are facing #1 here.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8373) Update Rolling Restart documentation


 [ 
https://issues.apache.org/jira/browse/HBASE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8373:
---

Status: Open  (was: Patch Available)

 Update Rolling Restart documentation
 

 Key: HBASE-8373
 URL: https://issues.apache.org/jira/browse/HBASE-8373
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-8373-v0-trunk.patch


 Rolling Restart documentation specifies that we need to stop the balancer 
 before proceeding. However, bin/graceful_stop.sh is already taking care of 
 that:
 {code}
 echo Disabling balancer!
 echo 'balance_switch false' | $bin/hbase --config ${HBASE_CONF_DIR} shell
 {code}
 So documentation need to be updated to remove this recommandation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635286#comment-13635286
 ] 

Ted Yu commented on HBASE-8374:
---

If loc.size() == 0, there is no information to fill in regionLocations.
serversToIndex.get() call would be skipped, avoiding NPE.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635300#comment-13635300
 ] 

Jean-Marc Spaggiari commented on HBASE-8374:


My point is that serversToIndex.get() was already skipped with loc.size() == 0 
because of the for. So I agree that adding the test will avoid the creation of 
the regionLocations array, but it will not fix the NPE error since it occured 
when loc.size() was  0.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Attachment: 8374-trunk-v2.txt

Patch v2 addresses the case where (some) ServerName couldn't be determined by 
regionFinder.getTopBlockLocations()

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635307#comment-13635307
 ] 

Nicolas Liochon commented on HBASE-8374:


I think it's serversToIndex.get(loc.get(i)); that returns null. As it's an 
Integer it's then casted to int, hence the NPE.  So v2 does not remove the NPE 
imho.
I added some tests locally, but I've not yet reproduced it, so I can't say for 
sure. If so, the question would be: why does it happen? 

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8353) -ROOT-/.META. regions are hanging if master restarted while closing -ROOT-/.META. regions on dead RS


[ 
https://issues.apache.org/jira/browse/HBASE-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635317#comment-13635317
 ] 

ramkrishna.s.vasudevan commented on HBASE-8353:
---

I feel the patch is ok.  
@Rajesh 
Again if we go to check the ROOT/META while startup we have to again see how 
long we should wait if ROOT/META goes down at that time..  What do you think?

 -ROOT-/.META. regions are hanging if master restarted while closing 
 -ROOT-/.META. regions on dead RS
 

 Key: HBASE-8353
 URL: https://issues.apache.org/jira/browse/HBASE-8353
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.94.6
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.94.8

 Attachments: HBASE-8353_94.patch


 ROOT/META are not getting assigned if master restarted while closing 
 ROOT/META.
 Lets suppose catalog table regions in M_ZK_REGION_CLOSING state during master 
 initialization and then just we are adding the them to RIT and waiting for 
 TM. {code}
 if (isOnDeadServer(regionInfo, deadServers) 
 (data.getOrigin() == null || 
 !serverManager.isServerOnline(data.getOrigin( {
   // If was on dead server, its closed now. Force to OFFLINE and this
   // will get it reassigned if appropriate
   forceOffline(regionInfo, data);
 } else {
   // Just insert region into RIT.
   // If this never updates the timeout will trigger new assignment
   regionsInTransition.put(encodedRegionName, new RegionState(
 regionInfo, RegionState.State.CLOSING,
 data.getStamp(), data.getOrigin()));
 }
 {code}
 isOnDeadServer always return false to ROOT/META because deadServers is null.
 Even TM cannot close them properly because its not available in online 
 regions since its not yet assigned.
 {code}
 synchronized (this.regions) {
   // Check if this region is currently assigned
   if (!regions.containsKey(region)) {
 LOG.debug(Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Attachment: 8374-trunk-v3.txt

See if patch v3 is better.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635328#comment-13635328
 ] 

Jean-Marc Spaggiari commented on HBASE-8374:


Hum, not sure. regionLocations[regionIndex][i] is int. If yo uassign null to it 
you will still get the NPE.

Also, as Nicolas is saying, the question would be: why does it happen? 

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8279) Performance Evaluation does not consider the args passed in case of more than one client


 [ 
https://issues.apache.org/jira/browse/HBASE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-8279:
--

Status: Open  (was: Patch Available)

 Performance Evaluation does not consider the args passed in case of more than 
 one client
 

 Key: HBASE-8279
 URL: https://issues.apache.org/jira/browse/HBASE-8279
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8279_1.patch, HBASE-8279_2.patch, HBASE-8279.patch


 Performance evaluation gives a provision to pass the table name.
 The table name is considered when we first initialize the table - like the 
 disabling and creation of tables happens with the name that we pass.
 But the write and read test again uses only the default table and so the perf 
 evaluation fails.
 I think the problem is like this
 {code}
  ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
 --table=MyTable2  --presplit=70 randomRead 2
 {code}
 {code}
 13/04/04 21:42:07 DEBUG hbase.HRegionInfo: Current INFO from scan results = 
 {NAME = 
 'MyTable2,0002067171,1365126124904.bc9e936f4f8ca8ee55eb90091d4a13b6.',
  STARTKEY = '0002067171', ENDKEY = '', ENCODED = 
 bc9e936f4f8ca8ee55eb90091d4a13b6,}
 13/04/04 21:42:07 INFO hbase.PerformanceEvaluation: Table created with 70 
 splits
 {code}
 You can see that the specified table is created with the splits.
 But when the read starts
 {code}
 Caused by: org.apache.hadoop.hbase.exceptions.TableNotFoundException: 
 TestTable
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1157)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1034)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:984)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:246)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:187)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.testSetup(PerformanceEvaluation.java:851)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:869)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1495)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$1.run(PerformanceEvaluation.java:590)
 {code}
 It says TestTable not found which is the default table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8279) Performance Evaluation does not consider the args passed in case of more than one client


 [ 
https://issues.apache.org/jira/browse/HBASE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-8279:
--

Attachment: HBASE-8279_2.patch

Latest patch for trunk.  I can commit this if nobody objects to this.

 Performance Evaluation does not consider the args passed in case of more than 
 one client
 

 Key: HBASE-8279
 URL: https://issues.apache.org/jira/browse/HBASE-8279
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8279_1.patch, HBASE-8279_2.patch, HBASE-8279.patch


 Performance evaluation gives a provision to pass the table name.
 The table name is considered when we first initialize the table - like the 
 disabling and creation of tables happens with the name that we pass.
 But the write and read test again uses only the default table and so the perf 
 evaluation fails.
 I think the problem is like this
 {code}
  ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
 --table=MyTable2  --presplit=70 randomRead 2
 {code}
 {code}
 13/04/04 21:42:07 DEBUG hbase.HRegionInfo: Current INFO from scan results = 
 {NAME = 
 'MyTable2,0002067171,1365126124904.bc9e936f4f8ca8ee55eb90091d4a13b6.',
  STARTKEY = '0002067171', ENDKEY = '', ENCODED = 
 bc9e936f4f8ca8ee55eb90091d4a13b6,}
 13/04/04 21:42:07 INFO hbase.PerformanceEvaluation: Table created with 70 
 splits
 {code}
 You can see that the specified table is created with the splits.
 But when the read starts
 {code}
 Caused by: org.apache.hadoop.hbase.exceptions.TableNotFoundException: 
 TestTable
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1157)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1034)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:984)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:246)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:187)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.testSetup(PerformanceEvaluation.java:851)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:869)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1495)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$1.run(PerformanceEvaluation.java:590)
 {code}
 It says TestTable not found which is the default table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8279) Performance Evaluation does not consider the args passed in case of more than one client


 [ 
https://issues.apache.org/jira/browse/HBASE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-8279:
--

Status: Patch Available  (was: Open)

 Performance Evaluation does not consider the args passed in case of more than 
 one client
 

 Key: HBASE-8279
 URL: https://issues.apache.org/jira/browse/HBASE-8279
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8279_1.patch, HBASE-8279_2.patch, HBASE-8279.patch


 Performance evaluation gives a provision to pass the table name.
 The table name is considered when we first initialize the table - like the 
 disabling and creation of tables happens with the name that we pass.
 But the write and read test again uses only the default table and so the perf 
 evaluation fails.
 I think the problem is like this
 {code}
  ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
 --table=MyTable2  --presplit=70 randomRead 2
 {code}
 {code}
 13/04/04 21:42:07 DEBUG hbase.HRegionInfo: Current INFO from scan results = 
 {NAME = 
 'MyTable2,0002067171,1365126124904.bc9e936f4f8ca8ee55eb90091d4a13b6.',
  STARTKEY = '0002067171', ENDKEY = '', ENCODED = 
 bc9e936f4f8ca8ee55eb90091d4a13b6,}
 13/04/04 21:42:07 INFO hbase.PerformanceEvaluation: Table created with 70 
 splits
 {code}
 You can see that the specified table is created with the splits.
 But when the read starts
 {code}
 Caused by: org.apache.hadoop.hbase.exceptions.TableNotFoundException: 
 TestTable
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1157)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1034)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:984)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:246)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:187)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.testSetup(PerformanceEvaluation.java:851)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:869)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1495)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$1.run(PerformanceEvaluation.java:590)
 {code}
 It says TestTable not found which is the default table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Attachment: 8374-trunk-v4.txt

w.r.t. Nicolas' question, I think what happened was that serversToIndex map 
wasn't fully populated because one loop was used to iterate through 
clusterState.entrySet().

In patch v4, I introduced another loop to populate serversToIndex map.

I kept the checking from patch v3 in case regionFinder.getTopBlockLocations() 
returns some ServerName which is not in serversToIndex map.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5583) Master restart on create table with splitkeys does not recreate table with all the splitkey regions


[ 
https://issues.apache.org/jira/browse/HBASE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635346#comment-13635346
 ] 

ramkrishna.s.vasudevan commented on HBASE-5583:
---

If HBASE-5487 takes some more time can we check this ?

 Master restart on create table with splitkeys does not recreate table with 
 all the splitkey regions
 ---

 Key: HBASE-5583
 URL: https://issues.apache.org/jira/browse/HBASE-5583
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.95.1

 Attachments: HBASE-5583_new_1.patch, HBASE-5583_new_1_review.patch, 
 HBASE-5583_new_2.patch, HBASE-5583_new_4_WIP.patch, 
 HBASE-5583_new_5_WIP_using_tableznode.patch


 - Create table using splitkeys
 - MAster goes down before all regions are added to meta
 - On master restart the table is again enabled but with less number of 
 regions than specified in splitkeys
 Anyway client will get an exception if i had called sync create table.  But 
 table exists or not check will say table exists. 
 Is this scenario to be handled by client only or can we have some mechanism 
 on the master side for this? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5583) Master restart on create table with splitkeys does not recreate table with all the splitkey regions


[ 
https://issues.apache.org/jira/browse/HBASE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635349#comment-13635349
 ] 

Ted Yu commented on HBASE-5583:
---

HBASE-5487 has no assignee and no Fix Version.

I think we can revive discussion for this JIRA.

 Master restart on create table with splitkeys does not recreate table with 
 all the splitkey regions
 ---

 Key: HBASE-5583
 URL: https://issues.apache.org/jira/browse/HBASE-5583
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.95.1

 Attachments: HBASE-5583_new_1.patch, HBASE-5583_new_1_review.patch, 
 HBASE-5583_new_2.patch, HBASE-5583_new_4_WIP.patch, 
 HBASE-5583_new_5_WIP_using_tableznode.patch


 - Create table using splitkeys
 - MAster goes down before all regions are added to meta
 - On master restart the table is again enabled but with less number of 
 regions than specified in splitkeys
 Anyway client will get an exception if i had called sync create table.  But 
 table exists or not check will say table exists. 
 Is this scenario to be handled by client only or can we have some mechanism 
 on the master side for this? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8353) -ROOT-/.META. regions are hanging if master restarted while closing -ROOT-/.META. regions on dead RS

2013-04-18 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635350#comment-13635350
 ] 

Jimmy Xiang commented on HBASE-8353:


[~rajesh23], if we don't change the origin, does this mean

{quote}
+if (regionInfo.isMetaTable()
+ !serverManager.isServerOnline(data.getOrigin())) {
+  forceOffline(regionInfo, data);
{quote}
We always re-assign the meta table when the master restarts, if it's closing?

I am ok with not changing the origin since it could be a compatibility issue.

 -ROOT-/.META. regions are hanging if master restarted while closing 
 -ROOT-/.META. regions on dead RS
 

 Key: HBASE-8353
 URL: https://issues.apache.org/jira/browse/HBASE-8353
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.94.6
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.94.8

 Attachments: HBASE-8353_94.patch


 ROOT/META are not getting assigned if master restarted while closing 
 ROOT/META.
 Lets suppose catalog table regions in M_ZK_REGION_CLOSING state during master 
 initialization and then just we are adding the them to RIT and waiting for 
 TM. {code}
 if (isOnDeadServer(regionInfo, deadServers) 
 (data.getOrigin() == null || 
 !serverManager.isServerOnline(data.getOrigin( {
   // If was on dead server, its closed now. Force to OFFLINE and this
   // will get it reassigned if appropriate
   forceOffline(regionInfo, data);
 } else {
   // Just insert region into RIT.
   // If this never updates the timeout will trigger new assignment
   regionsInTransition.put(encodedRegionName, new RegionState(
 regionInfo, RegionState.State.CLOSING,
 data.getStamp(), data.getOrigin()));
 }
 {code}
 isOnDeadServer always return false to ROOT/META because deadServers is null.
 Even TM cannot close them properly because its not available in online 
 regions since its not yet assigned.
 {code}
 synchronized (this.regions) {
   // Check if this region is currently assigned
   if (!regions.containsKey(region)) {
 LOG.debug(Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6774) Immediate assignment of regions that don't have entries in HLog

[
https://issues.apache.org/jira/browse/HBASE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635351#comment-13635351
]

Himanshu Vashishtha commented on HBASE-6774:

[~nkeywal] [~devaraj]: I am interested to know whether there is any progress on
this issue (making regions available which do not have a WAL entry, i.e., not
waiting for log splitting to finish). Faced this when working on a read
intensive workload. As Nkeywal commented earlier, it is quite useful for some
use-cases. There is already a separate WAL for .META., thanks to Devaraj. If
you guys are OK, I would like to work on this.

Immediate assignment of regions that don't have entries in HLog
---

Key: HBASE-6774
URL: https://issues.apache.org/jira/browse/HBASE-6774
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.95.2
Reporter: Nicolas Liochon

The algo is today, after a failure detection:
- split the logs
- when all the logs are split, assign the regions
But some regions can have no entries at all in the HLog. There are many
reasons for this:
- kind of reference or historical tables. Bulk written sometimes then read
only.
- sequential rowkeys. In this case, most of the regions will be read only.
But they can be in a regionserver with a lot of writes.
- tables flushed often for safety reasons. I'm thinking about meta here.
For meta; we can imagine flushing very often. Hence, the recovery for meta,
in many cases, will be the failure detection time.
There are different possible algos:
Option 1)
A new task is added, in parallel of the split. This task reads all the HLog.
If there is no entry for a region, this region is assigned.
Pro: simple
Cons: We will need to read all the files. Add a read.
Option 2)
The master writes in ZK the number of log files, per region.
When the regionserver starts the split, it reads the full block (64M) and
decrease the log file counter of the region. If it reaches 0, the assign
start. At the end of its split, the region server decreases the counter as
well. This allow to start the assign even if not all the HLog are finished.
It would allow to make some regions available even if we have an issue in one
of the log file.
Pro: parallel
Cons: add something to do for the region server. Requites to read the whole
file before starting to write.
Option 3)
Add some metadata at the end of the log file. The last log file won't have
meta data, as if we are recovering, it's because the server crashed. But the
others will. And last log file should be smaller (half a block on average).
Option 4) Still some metadata, but in a different file. Cons: write are
increased (but not that much, we just need to write the region once). Pros:
if we lose the HLog files (major failure, no replica available) we can still
continue with the regions that were not written at this stage.
I think it should be done, even if none of the algorithm above is totally
convincing yet. It's linked as well to locality and short circuit reads: with
these two points reading the file twice become much less of an issue for
example. My current preference would be to open the file twice in the region
server, once for splitting as of today, once for a quick read looking for
unused regions. Who knows, may be it would even be faster this way, the quick
read thread would warm-up the different caches for the splitting thread.

[jira] [Commented] (HBASE-8329) Limit compaction speed

2013-04-18 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635359#comment-13635359
 ] 

Nick Dimiduk commented on HBASE-8329:
-

If I understand the tickets correctly, all three of them are addressing the 
same symptoms via slightly different approaches. HBASE-5867 increases the 
threshold on the number of HFiles that triggers a major compaction. HBASE-3743 
introduces the idea of a major compaction manager that orchestrates a rolling 
compaction across the cluster. This ticket introduces a local compaction 
rate-limiter, configured on both time of day and IO throughput. [~aoxiang] does 
that summary sound about right?

I personally like the idea of both the macro monitor (HBASE-3743) and the 
localized throttle (this ticket). This patch would be improved by extracting 
out the throttling policy interface with this peak+rate as one implementation. 
Based on my novice understanding of the storage system, tweaking the threashold 
and HFiles count (HBASE-5867) is a bandaid specific to the current default 
storage implementation. It will become less applicable as [~sershe]'s work in 
modularization continues.

[~stack], [~nspiegelberg], [~larsgeorge], [~sershe] what's the right approach 
here?

 Limit compaction speed
 --

 Key: HBASE-8329
 URL: https://issues.apache.org/jira/browse/HBASE-8329
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: binlijin
 Attachments: HBASE-8329-trunk.patch


 There is no speed or resource limit for compaction，I think we should add this 
 feature especially when request burst.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635360#comment-13635360
 ] 

Jean-Marc Spaggiari commented on HBASE-8374:


Make sense!

I'm +4 with patch v4.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8317) Seek returns wrong result with PREFIX_TREE Encoding


[ 
https://issues.apache.org/jira/browse/HBASE-8317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635375#comment-13635375
 ] 

ramkrishna.s.vasudevan commented on HBASE-8317:
---

I think this can be committed.

 Seek returns wrong result with PREFIX_TREE Encoding
 ---

 Key: HBASE-8317
 URL: https://issues.apache.org/jira/browse/HBASE-8317
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-8317-v1.patch, hbase-trunk-8317.patch, 
 hbase-trunk-8317v3.patch


 TestPrefixTreeEncoding#testSeekWithFixedData from the patch could reproduce 
 the bug.
 An example of the bug case:
 Suppose the following rows:
 1.row3/c1:q1/
 2.row3/c1:q2/
 3.row3/c1:q3/
 4.row4/c1:q1/
 5.row4/c1:q2/
 After seeking the row 'row30', the expected peek KV is row4/c1:q1/, but 
 actual is row3/c1:q1/.
 I just fix this bug case in the patch, 
 Maybe we can do more for other potential problems if anyone is familiar with 
 the code of PREFIX_TREE

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7437) Improve CompactSelection


[ 
https://issues.apache.org/jira/browse/HBASE-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635390#comment-13635390
 ] 

Sergey Shelukhin commented on HBASE-7437:
-

Good catch w/Calendar, I looked at the sources, it appears to be the case 
indeed. Why do you pass peak expiration time as current time? Would it be good 
to pass the close to the end of current hour, for example, based on calendar 
minutes? Could be imprecise in some freak cases, but much less object creation.

 Improve CompactSelection
 

 Key: HBASE-7437
 URL: https://issues.apache.org/jira/browse/HBASE-7437
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Hiroshi Ikeda
Assignee: Hiroshi Ikeda
Priority: Minor
 Attachments: HBASE-7437.patch, HBASE-7437-V2.patch, 
 HBASE-7437-V3.patch, HBASE-7437-V4.patch


 1. Using AtomicLong makes CompactSelection simple and improve its performance.
 2. There are unused fields and methods.
 3. The fields should be private.
 4. Assertion in the method finishRequest seems wrong:
 {code}
   public void finishRequest() {
 if (isOffPeakCompaction) {
   long newValueToLog = -1;
   synchronized(compactionCountLock) {
 assert !isOffPeakCompaction : Double-counting off-peak count for 
 compaction;
 {code}
 The above assertion seems almost always false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8329) Limit compaction speed


[ 
https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635397#comment-13635397
 ] 

Sergey Shelukhin commented on HBASE-8329:
-

1) Very similar to offpeak hours tracking, should they be unified? See also 
HBASE-7437
2) Calling isPeakHour after every KV seems excessive, can it be improved? Esp. 
in light of 3.
3) In HBASE-7437 it was mentioned that the calendar object does not 
auto-update, so getting time of day is actually going to get you time of day of 
the calendar was first asked. That way it wouldn't work, and updating calendar 
after every KV would seem to be expensive.
4) Do you have some numbers for throttling?

 Limit compaction speed
 --

 Key: HBASE-8329
 URL: https://issues.apache.org/jira/browse/HBASE-8329
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: binlijin
 Attachments: HBASE-8329-trunk.patch


 There is no speed or resource limit for compaction，I think we should add this 
 feature especially when request burst.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6774) Immediate assignment of regions that don't have entries in HLog

[
https://issues.apache.org/jira/browse/HBASE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635395#comment-13635395
]

Devaraj Das commented on HBASE-6774:

I am fine with that, [~v.himanshu].. I guess we should start with a proposal
and agree on (this jira had multiple proposals).

Immediate assignment of regions that don't have entries in HLog
---

Key: HBASE-6774
URL: https://issues.apache.org/jira/browse/HBASE-6774
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Himanshu Vashishtha

[jira] [Commented] (HBASE-7437) Improve CompactSelection


[ 
https://issues.apache.org/jira/browse/HBASE-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635392#comment-13635392
 ] 

Sergey Shelukhin commented on HBASE-7437:
-

Sorry for long response time,  it fell off my radar

 Improve CompactSelection
 

 Key: HBASE-7437
 URL: https://issues.apache.org/jira/browse/HBASE-7437
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Hiroshi Ikeda
Assignee: Hiroshi Ikeda
Priority: Minor
 Attachments: HBASE-7437.patch, HBASE-7437-V2.patch, 
 HBASE-7437-V3.patch, HBASE-7437-V4.patch


 1. Using AtomicLong makes CompactSelection simple and improve its performance.
 2. There are unused fields and methods.
 3. The fields should be private.
 4. Assertion in the method finishRequest seems wrong:
 {code}
   public void finishRequest() {
 if (isOffPeakCompaction) {
   long newValueToLog = -1;
   synchronized(compactionCountLock) {
 assert !isOffPeakCompaction : Double-counting off-peak count for 
 compaction;
 {code}
 The above assertion seems almost always false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6774) Immediate assignment of regions that don't have entries in HLog

[
https://issues.apache.org/jira/browse/HBASE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635384#comment-13635384
]

Nicolas Liochon commented on HBASE-6774:

Ok for me of course :-). Thanks for this. I don't have an ideal solution in
mind, I guess there is some design work to do here, but may be Devaraj is more
advanced than me. I assign the jira to you in case you don't have the ar for
this.

Immediate assignment of regions that don't have entries in HLog
---

Key: HBASE-6774
URL: https://issues.apache.org/jira/browse/HBASE-6774
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.95.2
Reporter: Nicolas Liochon

[jira] [Commented] (HBASE-7437) Improve CompactSelection


[ 
https://issues.apache.org/jira/browse/HBASE-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635392#comment-13635392
 ] 

Sergey Shelukhin commented on HBASE-7437:
-

Sorry for long response time,  it fell off my radar

 Improve CompactSelection
 

 Key: HBASE-7437
 URL: https://issues.apache.org/jira/browse/HBASE-7437
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Hiroshi Ikeda
Assignee: Hiroshi Ikeda
Priority: Minor
 Attachments: HBASE-7437.patch, HBASE-7437-V2.patch, 
 HBASE-7437-V3.patch, HBASE-7437-V4.patch


 1. Using AtomicLong makes CompactSelection simple and improve its performance.
 2. There are unused fields and methods.
 3. The fields should be private.
 4. Assertion in the method finishRequest seems wrong:
 {code}
   public void finishRequest() {
 if (isOffPeakCompaction) {
   long newValueToLog = -1;
   synchronized(compactionCountLock) {
 assert !isOffPeakCompaction : Double-counting off-peak count for 
 compaction;
 {code}
 The above assertion seems almost always false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6774) Immediate assignment of regions that don't have entries in HLog

[
https://issues.apache.org/jira/browse/HBASE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nicolas Liochon updated HBASE-6774:
---

Assignee: Himanshu Vashishtha

Immediate assignment of regions that don't have entries in HLog
---

[jira] [Updated] (HBASE-7239) Verify protobuf serialization is correctly chunking upon read to avoid direct memory OOMs


 [ 
https://issues.apache.org/jira/browse/HBASE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-7239:
---

Status: Patch Available  (was: Open)

Let's see what hadoopqa says.

 Verify protobuf serialization is correctly chunking upon read to avoid direct 
 memory OOMs
 -

 Key: HBASE-7239
 URL: https://issues.apache.org/jira/browse/HBASE-7239
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Critical
 Fix For: 0.95.1

 Attachments: 7239-1.patch


 Result.readFields() used to read from the input stream in 8k chunks to avoid 
 OOM issues with direct memory.
 (Reading variable sized chunks into direct memory prevent the JVM from 
 reusing the allocated direct memory and direct memory is only collected 
 during full GCs)
 This is just to verify protobufs parseFrom type methods do the right thing as 
 well so that we do not reintroduce this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8279) Performance Evaluation does not consider the args passed in case of more than one client


[ 
https://issues.apache.org/jira/browse/HBASE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635409#comment-13635409
 ] 

Lars Hofhansl commented on HBASE-8279:
--

+1

 Performance Evaluation does not consider the args passed in case of more than 
 one client
 

 Key: HBASE-8279
 URL: https://issues.apache.org/jira/browse/HBASE-8279
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8279_1.patch, HBASE-8279_2.patch, HBASE-8279.patch


 Performance evaluation gives a provision to pass the table name.
 The table name is considered when we first initialize the table - like the 
 disabling and creation of tables happens with the name that we pass.
 But the write and read test again uses only the default table and so the perf 
 evaluation fails.
 I think the problem is like this
 {code}
  ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
 --table=MyTable2  --presplit=70 randomRead 2
 {code}
 {code}
 13/04/04 21:42:07 DEBUG hbase.HRegionInfo: Current INFO from scan results = 
 {NAME = 
 'MyTable2,0002067171,1365126124904.bc9e936f4f8ca8ee55eb90091d4a13b6.',
  STARTKEY = '0002067171', ENDKEY = '', ENCODED = 
 bc9e936f4f8ca8ee55eb90091d4a13b6,}
 13/04/04 21:42:07 INFO hbase.PerformanceEvaluation: Table created with 70 
 splits
 {code}
 You can see that the specified table is created with the splits.
 But when the read starts
 {code}
 Caused by: org.apache.hadoop.hbase.exceptions.TableNotFoundException: 
 TestTable
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1157)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1034)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:984)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:246)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:187)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.testSetup(PerformanceEvaluation.java:851)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:869)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1495)
 at 
 org.apache.hadoop.hbase.PerformanceEvaluation$1.run(PerformanceEvaluation.java:590)
 {code}
 It says TestTable not found which is the default table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635412#comment-13635412
 ] 

Hadoop QA commented on HBASE-8374:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12579361/8374-trunk-v3.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5346//console

This message is automatically generated.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);

[jira] [Created] (HBASE-8375) Streamline Table durability settings

Lars Hofhansl created HBASE-8375:


 Summary: Streamline Table durability settings
 Key: HBASE-8375
 URL: https://issues.apache.org/jira/browse/HBASE-8375
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl


HBASE-7801 introduces the notion of per mutation fine grained durability 
settings.
This issue is to consider and the discuss the same for the per table settings 
(i.e. what would be used if the mutation indicates USE_DEFAULT). I propose the 
following setting per table:
* SKIP_WAL (i.e. an unlogged table)
* ASYNC_WAL (the current deferred log flush)
* SYNC_WAL (the current default)
* FSYNC_WAL (for future uses of HDFS' hsync())



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8353) -ROOT-/.META. regions are hanging if master restarted while closing -ROOT-/.META. regions on dead RS


 [ 
https://issues.apache.org/jira/browse/HBASE-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-8353:
--

Attachment: HBASE-8353_94_2.patch

 -ROOT-/.META. regions are hanging if master restarted while closing 
 -ROOT-/.META. regions on dead RS
 

 Key: HBASE-8353
 URL: https://issues.apache.org/jira/browse/HBASE-8353
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.94.6
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.94.8

 Attachments: HBASE-8353_94_2.patch, HBASE-8353_94.patch


 ROOT/META are not getting assigned if master restarted while closing 
 ROOT/META.
 Lets suppose catalog table regions in M_ZK_REGION_CLOSING state during master 
 initialization and then just we are adding the them to RIT and waiting for 
 TM. {code}
 if (isOnDeadServer(regionInfo, deadServers) 
 (data.getOrigin() == null || 
 !serverManager.isServerOnline(data.getOrigin( {
   // If was on dead server, its closed now. Force to OFFLINE and this
   // will get it reassigned if appropriate
   forceOffline(regionInfo, data);
 } else {
   // Just insert region into RIT.
   // If this never updates the timeout will trigger new assignment
   regionsInTransition.put(encodedRegionName, new RegionState(
 regionInfo, RegionState.State.CLOSING,
 data.getStamp(), data.getOrigin()));
 }
 {code}
 isOnDeadServer always return false to ROOT/META because deadServers is null.
 Even TM cannot close them properly because its not available in online 
 regions since its not yet assigned.
 {code}
 synchronized (this.regions) {
   // Check if this region is currently assigned
   if (!regions.containsKey(region)) {
 LOG.debug(Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.


[ 
https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635421#comment-13635421
 ] 

Himanshu Vashishtha commented on HBASE-6970:


[~nkeywal] Can you please tell why we are not deleting the pid file after 
removing the znode?

 hbase-deamon.sh creates/updates pid file even when that start failed.
 -

 Key: HBASE-6970
 URL: https://issues.apache.org/jira/browse/HBASE-6970
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Nicolas Liochon

 We just ran into a strange issue where could neither start nor stop services 
 with hbase-deamon.sh.
 The problem is this:
 {code}
 nohup nice -n $HBASE_NICENESS $HBASE_HOME/bin/hbase \
 --config ${HBASE_CONF_DIR} \
 $command $@ $startStop  $logout 21  /dev/null 
 echo $!  $pid
 {code}
 So the pid file is created or updated even when the start of the service 
 failed. The next stop command will then fail, because the pid file has the 
 wrong pid in it.
 Edit: Spelling and more spelling errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8279) Performance Evaluation does not consider the args passed in case of more than one client


[ 
https://issues.apache.org/jira/browse/HBASE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635426#comment-13635426
 ] 

Hadoop QA commented on HBASE-8279:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12579362/HBASE-8279_2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5347//console

This message is automatically generated.

 Performance Evaluation does not consider the args passed in case of more than 
 one client
 

 Key: HBASE-8279
 URL: https://issues.apache.org/jira/browse/HBASE-8279
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8279_1.patch, HBASE-8279_2.patch, HBASE-8279.patch


 Performance evaluation gives a provision to pass the table name.
 The table name is considered when we first initialize the table - like the 
 disabling and creation of tables happens with the name that we pass.
 But the write and read test again uses only the default table and so the perf 
 evaluation fails.
 I think the problem is like this
 {code}
  ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
 --table=MyTable2  --presplit=70 randomRead 2
 {code}
 {code}
 13/04/04 21:42:07 DEBUG hbase.HRegionInfo: Current INFO from scan results = 
 {NAME = 
 'MyTable2,0002067171,1365126124904.bc9e936f4f8ca8ee55eb90091d4a13b6.',
  STARTKEY = '0002067171', ENDKEY = '', ENCODED = 
 bc9e936f4f8ca8ee55eb90091d4a13b6,}
 13/04/04 21:42:07 INFO hbase.PerformanceEvaluation: Table created with 70 
 splits
 {code}
 You can see that the specified table is created with the splits.
 But when the read starts
 {code}
 Caused by: org.apache.hadoop.hbase.exceptions.TableNotFoundException: 
 TestTable
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1157)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1034)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:984)
 at

[jira] [Commented] (HBASE-6774) Immediate assignment of regions that don't have entries in HLog

[
https://issues.apache.org/jira/browse/HBASE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635427#comment-13635427
]

Lars Hofhansl commented on HBASE-6774:
--

This mingles (somewhat at least) with HBASE-8375 that I just opened. One of the
options proposed there are unlogged tables (tables that never write WAL
entries). All regions of those tables could be assigned immediately.

Immediate assignment of regions that don't have entries in HLog
---

[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

[
https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635428#comment-13635428
]

Enis Soztutar commented on HBASE-8369:
--

bq. in general I'm against having another way to direct access the data, since
it means that you're giving up on optimizing the main one.
Conceptually, this is similar to the short circuit reads for HDFS. I agree that
we should not need these kinds of optimizations, since in the long term, it
will be impossible to implement QoS for IO if you give direct access to local
files (for SSR) / hdfs files (for snapshot).
bq. if the final implementation will be like this one using the HRegion object,
I'll be +1.
Yes, that is the plan.
bq. Are the initCredentials modifications in TableMapReduceUtil required for
the scope of this patch?
Yes, we do not need to initCredentials, since we are not talking to any hbase
server.

MapReduce over snapshot files
-

Attachments: hbase-8369_v0.patch

[jira] [Commented] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.


[ 
https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635431#comment-13635431
 ] 

Nicolas Liochon commented on HBASE-6970:


In the java code you mean? It is in the context of this jira?

 hbase-deamon.sh creates/updates pid file even when that start failed.
 -

 Key: HBASE-6970
 URL: https://issues.apache.org/jira/browse/HBASE-6970
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Nicolas Liochon

 We just ran into a strange issue where could neither start nor stop services 
 with hbase-deamon.sh.
 The problem is this:
 {code}
 nohup nice -n $HBASE_NICENESS $HBASE_HOME/bin/hbase \
 --config ${HBASE_CONF_DIR} \
 $command $@ $startStop  $logout 21  /dev/null 
 echo $!  $pid
 {code}
 So the pid file is created or updated even when the start of the service 
 failed. The next stop command will then fail, because the pid file has the 
 wrong pid in it.
 Edit: Spelling and more spelling errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8353) -ROOT-/.META. regions are hanging if master restarted while closing -ROOT-/.META. regions on dead RS


[ 
https://issues.apache.org/jira/browse/HBASE-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635438#comment-13635438
 ] 

rajeshbabu commented on HBASE-8353:
---

[~jxiang]
bq. We always re-assign the meta table when the master restarts, if it's 
closing?
Yes. we will re-assign even the RS holding the catalog region is not dead.

@Ram, with first patch we are not able to identify the origin is dead nor not 
which may cause double assignments.

In latest patch tried to handle without changing the origin. What do you say 
about latest patch? 



 -ROOT-/.META. regions are hanging if master restarted while closing 
 -ROOT-/.META. regions on dead RS
 

 Key: HBASE-8353
 URL: https://issues.apache.org/jira/browse/HBASE-8353
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.94.6
Reporter: rajeshbabu
Assignee: rajeshbabu
 Fix For: 0.94.8

 Attachments: HBASE-8353_94_2.patch, HBASE-8353_94.patch


 ROOT/META are not getting assigned if master restarted while closing 
 ROOT/META.
 Lets suppose catalog table regions in M_ZK_REGION_CLOSING state during master 
 initialization and then just we are adding the them to RIT and waiting for 
 TM. {code}
 if (isOnDeadServer(regionInfo, deadServers) 
 (data.getOrigin() == null || 
 !serverManager.isServerOnline(data.getOrigin( {
   // If was on dead server, its closed now. Force to OFFLINE and this
   // will get it reassigned if appropriate
   forceOffline(regionInfo, data);
 } else {
   // Just insert region into RIT.
   // If this never updates the timeout will trigger new assignment
   regionsInTransition.put(encodedRegionName, new RegionState(
 regionInfo, RegionState.State.CLOSING,
 data.getStamp(), data.getOrigin()));
 }
 {code}
 isOnDeadServer always return false to ROOT/META because deadServers is null.
 Even TM cannot close them properly because its not available in online 
 regions since its not yet assigned.
 {code}
 synchronized (this.regions) {
   // Check if this region is currently assigned
   if (!regions.containsKey(region)) {
 LOG.debug(Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6739) Single put should avoid batch overhead when autoflush is on

2013-04-18 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635439#comment-13635439
 ] 

Nick Dimiduk commented on HBASE-6739:
-

Can this be closed as a dupe of HBASE-5824?

 Single put should avoid  batch overhead when autoflush is on
 

 Key: HBASE-6739
 URL: https://issues.apache.org/jira/browse/HBASE-6739
 Project: HBase
  Issue Type: Improvement
Reporter: Jimmy Xiang
Priority: Minor

 Currently, even when autoflush is on, a single put is handled the same way as 
 if autoflush is off: convert the put to multi-action, create a callable, hand 
 it to an executor to process, wait for it to complete.
 We can avoid this overhead for single put if autoflush is on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

2013-04-18 Thread Gary Helmling (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635453#comment-13635453
]

Gary Helmling commented on HBASE-8369:
--

bq. Yes, we do not need to initCredentials, since we are not talking to any
hbase server.

So would this completely bypass security? I also want this functionality for
certain use cases, we should just be clear on this caveat.

MapReduce over snapshot files
-

Attachments: hbase-8369_v0.patch

[jira] [Commented] (HBASE-8375) Streamline Table durability settings


[ 
https://issues.apache.org/jira/browse/HBASE-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635456#comment-13635456
 ] 

Devaraj Das commented on HBASE-8375:


I think we should seriously consider HBASE-5930 in the context of these don't 
write to WAL changes. That jira would have done at least some damage control 
in the events of node failures.

 Streamline Table durability settings
 

 Key: HBASE-8375
 URL: https://issues.apache.org/jira/browse/HBASE-8375
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl

 HBASE-7801 introduces the notion of per mutation fine grained durability 
 settings.
 This issue is to consider and the discuss the same for the per table settings 
 (i.e. what would be used if the mutation indicates USE_DEFAULT). I propose 
 the following setting per table:
 * SKIP_WAL (i.e. an unlogged table)
 * ASYNC_WAL (the current deferred log flush)
 * SYNC_WAL (the current default)
 * FSYNC_WAL (for future uses of HDFS' hsync())

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.


[ 
https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635460#comment-13635460
 ] 

Himanshu Vashishtha commented on HBASE-6970:


No, in the hbase-daemon.sh script, there is a clearZNode() method which deletes 
the rs znode, but the pid file is kept intact. I wonder why this is so.

 hbase-deamon.sh creates/updates pid file even when that start failed.
 -

 Key: HBASE-6970
 URL: https://issues.apache.org/jira/browse/HBASE-6970
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Nicolas Liochon

 We just ran into a strange issue where could neither start nor stop services 
 with hbase-deamon.sh.
 The problem is this:
 {code}
 nohup nice -n $HBASE_NICENESS $HBASE_HOME/bin/hbase \
 --config ${HBASE_CONF_DIR} \
 $command $@ $startStop  $logout 21  /dev/null 
 echo $!  $pid
 {code}
 So the pid file is created or updated even when the start of the service 
 failed. The next stop command will then fail, because the pid file has the 
 wrong pid in it.
 Edit: Spelling and more spelling errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635464#comment-13635464
 ] 

Hadoop QA commented on HBASE-8374:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12579365/8374-trunk-v4.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5348//console

This message is automatically generated.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);

[jira] [Created] (HBASE-8376) MiniHBaseCluster#waitFor{Master|RegionServer)ToStop should implement timeout.

rajeshbabu created HBASE-8376:
-

 Summary: MiniHBaseCluster#waitFor{Master|RegionServer)ToStop 
should implement timeout.
 Key: HBASE-8376
 URL: https://issues.apache.org/jira/browse/HBASE-8376
 Project: HBase
  Issue Type: Improvement
  Components: test
Reporter: rajeshbabu
Assignee: rajeshbabu
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8376) MiniHBaseCluster#waitFor{Master|RegionServer)ToStop should implement timeout.


 [ 
https://issues.apache.org/jira/browse/HBASE-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-8376:
--

Description: 
Presently we are ignoring timeout in 
watiForMasterToStop,watiForRegionServerToStop methods in MiniHBaseCluster
{code}
  @Override
  public void waitForRegionServerToStop(ServerName serverName, long timeout) 
throws IOException {
//ignore timeout for now
waitOnRegionServer(getRegionServerIndex(serverName));
  }
{code}

{code}
  @Override
  public void waitForMasterToStop(ServerName serverName, long timeout) throws 
IOException {
//ignore timeout for now
waitOnMaster(getMasterIndex(serverName));
  }
{code}
We can implement timeout in these methods.

 MiniHBaseCluster#waitFor{Master|RegionServer)ToStop should implement timeout.
 -

 Key: HBASE-8376
 URL: https://issues.apache.org/jira/browse/HBASE-8376
 Project: HBase
  Issue Type: Improvement
  Components: test
Reporter: rajeshbabu
Assignee: rajeshbabu
Priority: Minor

 Presently we are ignoring timeout in 
 watiForMasterToStop,watiForRegionServerToStop methods in MiniHBaseCluster
 {code}
   @Override
   public void waitForRegionServerToStop(ServerName serverName, long timeout) 
 throws IOException {
 //ignore timeout for now
 waitOnRegionServer(getRegionServerIndex(serverName));
   }
 {code}
 {code}
   @Override
   public void waitForMasterToStop(ServerName serverName, long timeout) throws 
 IOException {
 //ignore timeout for now
 waitOnMaster(getMasterIndex(serverName));
   }
 {code}
 We can implement timeout in these methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8375) Streamline Table durability settings


[ 
https://issues.apache.org/jira/browse/HBASE-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635482#comment-13635482
 ] 

Lars Hofhansl commented on HBASE-8375:
--

Agreed.

 Streamline Table durability settings
 

 Key: HBASE-8375
 URL: https://issues.apache.org/jira/browse/HBASE-8375
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl

 HBASE-7801 introduces the notion of per mutation fine grained durability 
 settings.
 This issue is to consider and the discuss the same for the per table settings 
 (i.e. what would be used if the mutation indicates USE_DEFAULT). I propose 
 the following setting per table:
 * SKIP_WAL (i.e. an unlogged table)
 * ASYNC_WAL (the current deferred log flush)
 * SYNC_WAL (the current default)
 * FSYNC_WAL (for future uses of HDFS' hsync())

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8365) Duplicated ZK notifications cause Master abort (or other unknown issues)

2013-04-18 Thread Jeffrey Zhong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635488#comment-13635488
 ] 

Jeffrey Zhong commented on HBASE-8365:
--

{quote}
nodeDataChangeEvent only will give the latest data because it will not be able 
to read the old data
{quote}
ZooKeeper intentionally only sends out notifications without passing the 
original state which triggers the notification. It relies on clients to fetch 
the latest state. In addition, ZooKeeper watcher is one-time trigger which 
means it only fire once and client need re-setup watcher on the same znode to 
get next notification.

In our case, from the log, the related updates with watcher set on the region 
are: 1) opening-opening 2) opening-failed_open 3) failed_open-offline 4) 
offline-opening

The first notification(when we got FAILED_OPEN) is triggered by the update of 
opening-opening. When Master got the notification and znode was already 
changed to failed_open, that's the first trace nodeDataChange. 

The thing puzzles me is that ZooKeeper watcher will reset up on failed_open 
state after receiving the first failed_open and should only get more 
notifications when failed_open state changes. While we still get one more 
failed_open later from the same znode and data has the same version as we 
received the first notification. I guess we may trigger ZK client reads stale 
cache data when the node state changes from failed_open - offline OR race 
conditions in ZK side to cause the dup notifications.
 
 
  



 Duplicated ZK notifications cause Master abort (or other unknown issues)
 

 Key: HBASE-8365
 URL: https://issues.apache.org/jira/browse/HBASE-8365
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.6
Reporter: Jeffrey Zhong
 Attachments: TestResult.txt


 The duplicated ZK notifications should happen in trunk as well. Since the way 
 we handle ZK notifications is different in trunk, we don't see the issue 
 there. I'll explain later.
 The issue is causing TestMetaReaderEditor.testRetrying flaky with error 
 message {code}reader: count=2, t=null{code} A related link is at 
 https://builds.apache.org/job/HBase-0.94/941/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/
 The test case failure is due to an IllegalStateException and master is 
 aborted so the rest test cases also failed after testRetrying.
 Below are steps why the issue is happening(region 
 fa0e7a5590feb69bd065fbc99c228b36 is in interests):
 1) Got first notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,197
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 In the step, AM tries to open the region on another RS in a separate thread
 2) Got second notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,200 
 {code}DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 3) Later got opening notification event result from the step 1 at 2013-04-04 
 17:39:01,288 
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_OPENING, 
 server=janus.apache.org,54833,1365097126175, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 Step 2 ClosedRegionHandler throws IllegalStateException because Cannot 
 transit it to OFFLINE(state is in opening from notification 3) and abort 
 Master. This could happen in 0.94 because we handle notifications using 
 executorService which opens the door to handle events out of order through 
 receive them in order of updates.
 I've confirmed that we don't have duplicated AM listeners and both events 
 triggered by same ZK data of exact same version. The issue can be reproduced 
 once by running testRetrying test case 20 times in a loop.
 There are several issues for the failure:
 1) duplicated ZK notifications. Since ZK watcher is one time trigger, the 
 duplicated notification should not happen from the same data of the same 
 version in the first place
 2) ZooKeeper watcher handling is wrong in both 0.94 and trunk as following:
 a) 0.94 handle notifications in async way which may lead to handle 
 notifications out of order of the events happened
 b) In trunk, we handle ZK notifications synchronously which slows down other 
 components such as SSH, LogSplitting etc. because we have a single 
 notification queue
 c) In trunk  0.94, we could use stale event data because we have a long 
 listener list. ZK node state could have changed at the time when

[jira] [Updated] (HBASE-8365) Duplicated ZK notifications cause Master abort (or other unknown issues)


 [ 
https://issues.apache.org/jira/browse/HBASE-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-8365:
--

Attachment: TestZookeeper.txt

Recently I have also observed duplicated zk notifications. FYI attaching logs. 
These also may useful for analysis. I have tried to debug but not able to 
reproduce.
{code}
2013-04-07 12:14:50,735 INFO  [hbase-am-zkevent-worker-pool-20-thread-7] 
master.RegionStates(264): Region {NAME = 
'testLogSplittingAfterMasterRecoveryDueToZKExpiry,,1365336889784.fb4182aef4ce07f011871ae0a083aee0.',
 STARTKEY = '', ENDKEY = '1', ENCODED = fb4182aef4ce07f011871ae0a083aee0,} 
transitioned from 
{testLogSplittingAfterMasterRecoveryDueToZKExpiry,,1365336889784.fb4182aef4ce07f011871ae0a083aee0.
 state=OPENING, ts=1365336890719, 
server=asf001.sp2.ygridcore.net,60884,1365336878389} to 
{testLogSplittingAfterMasterRecoveryDueToZKExpiry,,1365336889784.fb4182aef4ce07f011871ae0a083aee0.
 state=OPEN, ts=1365336890735, 
server=asf001.sp2.ygridcore.net,60884,1365336878389}
2013-04-07 12:14:50,735 DEBUG [hbase-am-zkevent-worker-pool-2-thread-20] 
master.AssignmentManager(740): Handling transition=RS_ZK_REGION_OPENED, 
server=asf001.sp2.ygridcore.net,60884,1365336878389, 
region=fb4182aef4ce07f011871ae0a083aee0, current state from region state map 
={testLogSplittingAfterMasterRecoveryDueToZKExpiry,,1365336889784.fb4182aef4ce07f011871ae0a083aee0.
 state=OPEN, ts=1365336890727, 
server=asf001.sp2.ygridcore.net,60884,1365336878389}
2013-04-07 12:14:50,736 WARN  [hbase-am-zkevent-worker-pool-2-thread-20] 
master.AssignmentManager(934): Received OPENED for region 
fb4182aef4ce07f011871ae0a083aee0 from server 
asf001.sp2.ygridcore.net,60884,1365336878389 but region was in the state 
{testLogSplittingAfterMasterRecoveryDueToZKExpiry,,1365336889784.fb4182aef4ce07f011871ae0a083aee0.
 state=OPEN, ts=1365336890727, 
server=asf001.sp2.ygridcore.net,60884,1365336878389} and not in expected 
PENDING_OPEN or OPENING states, or not on the expected server
{code}

 Duplicated ZK notifications cause Master abort (or other unknown issues)
 

 Key: HBASE-8365
 URL: https://issues.apache.org/jira/browse/HBASE-8365
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.6
Reporter: Jeffrey Zhong
 Attachments: TestResult.txt, TestZookeeper.txt


 The duplicated ZK notifications should happen in trunk as well. Since the way 
 we handle ZK notifications is different in trunk, we don't see the issue 
 there. I'll explain later.
 The issue is causing TestMetaReaderEditor.testRetrying flaky with error 
 message {code}reader: count=2, t=null{code} A related link is at 
 https://builds.apache.org/job/HBase-0.94/941/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/
 The test case failure is due to an IllegalStateException and master is 
 aborted so the rest test cases also failed after testRetrying.
 Below are steps why the issue is happening(region 
 fa0e7a5590feb69bd065fbc99c228b36 is in interests):
 1) Got first notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,197
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 In the step, AM tries to open the region on another RS in a separate thread
 2) Got second notification event RS_ZK_REGION_FAILED_OPEN at 2013-04-04 
 17:39:01,200 
 {code}DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_FAILED_OPEN, 
 server=janus.apache.org,42093,1365097126155, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 3) Later got opening notification event result from the step 1 at 2013-04-04 
 17:39:01,288 
 {code} DEBUG [pool-1-thread-1-EventThread] master.AssignmentManager(744): 
 Handling transition=RS_ZK_REGION_OPENING, 
 server=janus.apache.org,54833,1365097126175, 
 region=fa0e7a5590feb69bd065fbc99c228b36{code}
 Step 2 ClosedRegionHandler throws IllegalStateException because Cannot 
 transit it to OFFLINE(state is in opening from notification 3) and abort 
 Master. This could happen in 0.94 because we handle notifications using 
 executorService which opens the door to handle events out of order through 
 receive them in order of updates.
 I've confirmed that we don't have duplicated AM listeners and both events 
 triggered by same ZK data of exact same version. The issue can be reproduced 
 once by running testRetrying test case 20 times in a loop.
 There are several issues for the failure:
 1) duplicated ZK notifications. Since ZK watcher is one time trigger, the 
 duplicated notification should not happen from the same data of the same 
 version in the first place
 2)

[jira] [Commented] (HBASE-7239) Verify protobuf serialization is correctly chunking upon read to avoid direct memory OOMs

[
https://issues.apache.org/jira/browse/HBASE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635531#comment-13635531
]

Hadoop QA commented on HBASE-7239:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12579277/7239-1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/5349//console

This message is automatically generated.

Verify protobuf serialization is correctly chunking upon read to avoid direct
memory OOMs
-

Key: HBASE-7239
URL: https://issues.apache.org/jira/browse/HBASE-7239
Project: HBase
Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Critical
Fix For: 0.95.1

Attachments: 7239-1.patch

Result.readFields() used to read from the input stream in 8k chunks to avoid
OOM issues with direct memory.
(Reading variable sized chunks into direct memory prevent the JVM from
reusing the allocated direct memory and direct memory is only collected
during full GCs)
This is just to verify protobufs parseFrom type methods do the right thing as
well so that we do not reintroduce this problem.

[jira] [Commented] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.

[
https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635536#comment-13635536
]

Himanshu Vashishtha commented on HBASE-6970:

I see, the pid file is deleted in case normal stop command is used. I was using
kill -9 rs_proc, in that case, only cleanZnode is called by the script. I
wonder shouldn't we delete the pid in either case (just like deleting the
znode), irrespective the rs process died? I don't see any benefit of keeping
that file. Did I miss anything. Thanks.

hbase-deamon.sh creates/updates pid file even when that start failed.
-

Key: HBASE-6970
URL: https://issues.apache.org/jira/browse/HBASE-6970
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Nicolas Liochon

We just ran into a strange issue where could neither start nor stop services
with hbase-deamon.sh.
The problem is this:
{code}
nohup nice -n $HBASE_NICENESS $HBASE_HOME/bin/hbase \
--config ${HBASE_CONF_DIR} \
$command $@ $startStop $logout 21 /dev/null
echo $! $pid
{code}
So the pid file is created or updated even when the start of the service
failed. The next stop command will then fail, because the pid file has the
wrong pid in it.
Edit: Spelling and more spelling errors.

[jira] [Commented] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.


[ 
https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635562#comment-13635562
 ] 

Nicolas Liochon commented on HBASE-6970:


the pid file is supposed to be there is the process is still there, so deleting 
the znode is not enough: we need to be sure that the process died.

 hbase-deamon.sh creates/updates pid file even when that start failed.
 -

 Key: HBASE-6970
 URL: https://issues.apache.org/jira/browse/HBASE-6970
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Nicolas Liochon

 We just ran into a strange issue where could neither start nor stop services 
 with hbase-deamon.sh.
 The problem is this:
 {code}
 nohup nice -n $HBASE_NICENESS $HBASE_HOME/bin/hbase \
 --config ${HBASE_CONF_DIR} \
 $command $@ $startStop  $logout 21  /dev/null 
 echo $!  $pid
 {code}
 So the pid file is created or updated even when the start of the service 
 failed. The next stop command will then fail, because the pid file has the 
 wrong pid in it.
 Edit: Spelling and more spelling errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8366) HBaseServer logs the full query.

2013-04-18 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635584#comment-13635584
 ] 

Andrew Purtell commented on HBASE-8366:
---

+1!

 HBaseServer logs the full query.
 

 Key: HBASE-8366
 URL: https://issues.apache.org/jira/browse/HBASE-8366
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8366.v1.patch


 We log the query when we have an error. As a results, the logs are not 
 readable when using stuff like multi.
 As a side note, this is as well a security issue (no need to encrypt the 
 network and the storage if the logs contain everything). I'm not removing the 
 full log line here; but just ask and I do it :-).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8366) HBaseServer logs the full query.


[ 
https://issues.apache.org/jira/browse/HBASE-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635595#comment-13635595
 ] 

Himanshu Vashishtha commented on HBASE-8366:


+1.
Thanks Stack.

 HBaseServer logs the full query.
 

 Key: HBASE-8366
 URL: https://issues.apache.org/jira/browse/HBASE-8366
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.95.1

 Attachments: 8366.v1.patch


 We log the query when we have an error. As a results, the logs are not 
 readable when using stuff like multi.
 As a side note, this is as well a security issue (no need to encrypt the 
 network and the storage if the logs contain everything). I'm not removing the 
 full log line here; but just ask and I do it :-).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8352) Rename '.snapshot' directory

2013-04-18 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635675#comment-13635675
 ] 

Tsz Wo (Nicholas), SZE commented on HBASE-8352:
---

Thanks for everyone.  You guys have done a great job!

 Rename '.snapshot' directory
 

 Key: HBASE-8352
 URL: https://issues.apache.org/jira/browse/HBASE-8352
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Blocker
 Fix For: 0.98.0, 0.94.7, 0.95.1

 Attachments: 8352-0.94-v1.txt, 8352-0.94-v2.txt, 8352-0.94-v3.txt, 
 8352-0.94-v4.txt, 8352-trunk.txt, 8352-trunk-v2.txt, 8352-trunk-v3.txt, 
 8352-trunk-v4.txt, 8352-trunk-v5.txt, 8352-trunk-v6.txt


 Testing HBase Snapshot on top of Hadoop's Snapshot branch 
 (http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/), we found 
 that both features used '.snapshot' directory to store metadata.
 HDFS (built from HDFS-2802 branch) doesn't allow paths with .snapshot as a 
 component
 From discussion on d...@hbase.apache.org, (see 
 http://search-hadoop.com/m/kY6C3cXMs51), consensus was to rename '.snapshot' 
 directory in HBase so that both features can co-exist smoothly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635693#comment-13635693
 ] 

Enis Soztutar commented on HBASE-8374:
--

Nice bug. Agree that serversToIndex is not populated first. Also it might 
happen that RegionLocationFinder might return region locations that we do not 
know about (the RS might have died, and we could be caching the data, etc). We 
should still guard against serversToIndex.get(loc.get(i)) returning null. 
For the patch, we should not use boxed primitives (for regionLocations = new 
int[numRegions][];). We can use -1, to indicate a null value.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Attachment: 8374-trunk-v5.txt

Patch v5 changes regionLocations back to int array.

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt, 8374-trunk-v5.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Attachment: (was: 8374-trunk-v5.txt)

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt, 8374-trunk-v5.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8374) NPE when launching the balance


 [ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8374:
--

Attachment: 8374-trunk-v5.txt

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt, 8374-trunk-v5.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8369) MapReduce over snapshot files

[
https://issues.apache.org/jira/browse/HBASE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635708#comment-13635708
]

Enis Soztutar commented on HBASE-8369:
--

bq. So would this completely bypass security?
Underlying hFiles are owned by the hbase user. For reading the files from MR
files, a couple of options comes to my mind:
(1) open the files directly from hdfs, in which case, the user has to be in the
same group and have group permissions to read the files, or the user has to be
the hbase user. Similar to current SSR.
(2) have HBase servers open the file, and pass the file handlers to the MR job,
similar to the approach in HDFS-347. This is obviously more involved and
require a live HBase cluster.
(3) Copy snapshot files as different user. This will only be applicable to
exported snapshots. Copying data for in-place snapshots would be costly.
any other ideas?

MapReduce over snapshot files
-

Attachments: hbase-8369_v0.patch

[jira] [Commented] (HBASE-8374) NPE when launching the balance


[ 
https://issues.apache.org/jira/browse/HBASE-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635709#comment-13635709
 ] 

Enis Soztutar commented on HBASE-8374:
--

lgtm. Thanks Ted for taking this on. Nicolas, any chance you can try this with 
the cluster? 

 NPE when launching the balance
 --

 Key: HBASE-8374
 URL: https://issues.apache.org/jira/browse/HBASE-8374
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.95.0
 Environment: AWS / real cluster with 3 nodes + master
Reporter: Nicolas Liochon
Assignee: Ted Yu
 Fix For: 0.98.0, 0.95.1

 Attachments: 8374-trunk.txt, 8374-trunk-v2.txt, 8374-trunk-v3.txt, 
 8374-trunk-v4.txt, 8374-trunk-v5.txt


 I don't reproduce this all the time, but I had it on a fairly clean env.
 It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the 
 balancer does not run. When it starts to occurs, it occurs all the time. I 
 haven't tried to restart the master, but I think it should be enough.
 Now, looking at the code, the NPE is strange. 
 {noformat}
 2013-04-18 08:09:52,079 ERROR [box,6,1366281581983-BalancerChore] 
 org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.init(BaseLoadBalancer.java:145)
   at 
 org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194)
   at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295)
   at 
 org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:81)
   at java.lang.Thread.run(Thread.java:662)
 2013-04-18 08:09:52,103 DEBUG [box,6,1366281581983-CatalogJanitor] 
 org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. 
 starting at key ''
 {noformat}
 {code}
   if (regionFinder != null) {
 //region location
 ListServerName loc = regionFinder.getTopBlockLocations(region);
 regionLocations[regionIndex] = new int[loc.size()];
 for (int i=0; i  loc.size(); i++) {
   regionLocations[regionIndex][i] = 
 serversToIndex.get(loc.get(i));  // = NPE here
 }
   }
 {code}
 pinging [~enis], just in case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8377) IntegrationTestBigLinkedList calculates wrap for linked list size incorrectly

Enis Soztutar created HBASE-8377:


 Summary: IntegrationTestBigLinkedList calculates wrap for linked 
list size incorrectly
 Key: HBASE-8377
 URL: https://issues.apache.org/jira/browse/HBASE-8377
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1


There is a bug in IntegrationTestBigLinkedList that it reads the wrong config 
key to calculate the wrap size for the linked list. It uses num mappers, 
instead of num recors per mapper. This has not been caught before, because it 
causes the test to fail only if 1M is not divisible by num mappers. So 
launching the job with num mappers 1, 2, 4, 5 would succeed, while 6 will fail, 
etc. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8377) IntegrationTestBigLinkedList calculates wrap for linked list size incorrectly


 [ 
https://issues.apache.org/jira/browse/HBASE-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-8377:
-

Attachment: hbase-8377_v1.patch

Simple patch. 

 IntegrationTestBigLinkedList calculates wrap for linked list size incorrectly
 -

 Key: HBASE-8377
 URL: https://issues.apache.org/jira/browse/HBASE-8377
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: hbase-8377_v1.patch


 There is a bug in IntegrationTestBigLinkedList that it reads the wrong config 
 key to calculate the wrap size for the linked list. It uses num mappers, 
 instead of num recors per mapper. This has not been caught before, because it 
 causes the test to fail only if 1M is not divisible by num mappers. So 
 launching the job with num mappers 1, 2, 4, 5 would succeed, while 6 will 
 fail, etc. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8377) IntegrationTestBigLinkedList calculates wrap for linked list size incorrectly