[jira] [Commented] (HBASE-14771) RpcServer.getRemoteAddress always returns null.

2015-11-06 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993334#comment-14993334
 ] 

Appy commented on HBASE-14771:
--

the patch looks good. Can you think of a unit test? Maybe using dummy service 
and method.

> RpcServer.getRemoteAddress always returns null.
> ---
>
> Key: HBASE-14771
> URL: https://issues.apache.org/jira/browse/HBASE-14771
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0
>Reporter: Abhishek Kumar
>Assignee: Abhishek Kumar
>Priority: Minor
> Attachments: HBASE-14771.patch
>
>
> RpcServer.getRemoteAddress always returns null, because Call object is 
> getting initialized with null.This seems to be happening because of using 
> RpcServer.getRemoteIp() in  Call object constructor before RpcServer thread 
> local 'CurCall' being set in CallRunner.run method:
> {noformat}
> // --- RpcServer.java ---
> protected void processRequest(byte[] buf) throws IOException, 
> InterruptedException {
>  .
> // Call object getting initialized here with address 
> // obtained from RpcServer.getRemoteIp()
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, RpcServer.getRemoteIp());
>   scheduler.dispatch(new CallRunner(RpcServer.this, call));
>  }
> // getRemoteIp method gets address from threadlocal 'CurCall' which 
> // gets set in CallRunner.run and calling it before this as in above case, 
> will return null
> // --- CallRunner.java ---
> public void run() {
>   .   
>   Pair resultPair = null;
>   RpcServer.CurCall.set(call);
>   ..
> }
> // Using 'this.addr' in place of getRemoteIp method in RpcServer.java seems 
> to be fixing this issue
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, this.addr);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14767) Remove deprecated functions from HBaseAdmin

2015-11-06 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-14767:
-
Attachment: HBASE-14767-master-v4.patch

v4: changes usages in shell code too. test ran - TestShell.
Ready for review.

> Remove deprecated functions from HBaseAdmin
> ---
>
> Key: HBASE-14767
> URL: https://issues.apache.org/jira/browse/HBASE-14767
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-14767-master-v2.patch, 
> HBASE-14767-master-v3.patch, HBASE-14767-master-v4.patch, 
> HBASE-14767-master.patch
>
>
> Many functions in HBaseAdmin are marked deprecated. Removing them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14769) Remove unused functions and duplicate javadocs from HBaseAdmin

2015-11-06 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-14769:
-
Attachment: HBASE-14769-master-v4.patch

v4: fixing checkstyle errors.
Hudson will very likely still report +3 checkstyle errors of type 
"MissingDeprecatedMust include both @java.lang.Deprecated annotation and 
@deprecated Javadoc tag with description". But that's because of removing 
comments since parent function already contains deprecated tag too. Is it 
alright?

> Remove unused functions and duplicate javadocs from HBaseAdmin 
> ---
>
> Key: HBASE-14769
> URL: https://issues.apache.org/jira/browse/HBASE-14769
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-14769-master-v2.patch, 
> HBASE-14769-master-v3.patch, HBASE-14769-master-v4.patch, 
> HBASE-14769-master.patch
>
>
> HBaseAdmin is marked private, so removing the functions not being used 
> anywhere.
> Also, the javadocs of overridden functions are same as corresponding ones in 
> Admin.java. Since javadocs are automatically inherited from the interface 
> class, we can remove these redundant 100s of lines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-11-06 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13082:
---
Status: Open  (was: Patch Available)

> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: ramkrishna.s.vasudevan
> Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
> 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, 
> HBASE-13082_1_WIP.patch, HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, 
> HBASE-13082_4.patch, HBASE-13082_9.patch, HBASE-13082_9.patch, gc.png, 
> gc.png, gc.png, hits.png, next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-14706) RegionLocationFinder should return multiple servernames by top host

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-14706:


> RegionLocationFinder should return multiple servernames by top host
> ---
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14706-branch-1.1.patch, 
> HBASE-14706-trunk_v1.patch, HBASE-14706-trunk_v2.patch, 
> HBASE-14706-trunk_v3.patch, HBASE-14706-trunk_v4.patch, HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14706) RegionLocationFinder should return multiple servernames by top host

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14706:
---
Status: Patch Available  (was: Reopened)

> RegionLocationFinder should return multiple servernames by top host
> ---
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14706-branch-1.1.patch, 
> HBASE-14706-trunk_v1.patch, HBASE-14706-trunk_v2.patch, 
> HBASE-14706-trunk_v3.patch, HBASE-14706-trunk_v4.patch, HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14767) Remove deprecated functions from HBaseAdmin

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993460#comment-14993460
 ] 

Hadoop QA commented on HBASE-14767:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12770965/HBASE-14767-master-v3.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12770965

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 112 
new or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16424//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16424//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16424//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16424//console

This message is automatically generated.

> Remove deprecated functions from HBaseAdmin
> ---
>
> Key: HBASE-14767
> URL: https://issues.apache.org/jira/browse/HBASE-14767
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-14767-master-v2.patch, 
> HBASE-14767-master-v3.patch, HBASE-14767-master-v4.patch, 
> HBASE-14767-master.patch
>
>
> Many functions in HBaseAdmin are marked deprecated. Removing them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14772) Improve zombie detector; be more discerning

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993466#comment-14993466
 ] 

Hadoop QA commented on HBASE-14772:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12770968/zombiev2.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12770968

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+echo "Found ${ZOMBIE_TESTS_COUNT} suspicious java process(es); waiting 
${wait}s to see if just slow to stop"
+  {color:red}-1 core zombie tests{color}.  There are ${ZOMBIE_TESTS_COUNT} 
possible zombie test(s): ${ZB_STACK}"
+
> Key: HBASE-14772
> URL: https://issues.apache.org/jira/browse/HBASE-14772
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Attachments: zombie.patch, zombiev2.patch
>
>
> Currently, any surefire process with the hbase flag is a potential zombie. 
> Our zombie check currently takes a reading and if it finds candidate zombies, 
> it waits 30 seconds and then does another reading. If a concurrent build 
> going on, in both cases the zombie detector will come up positive though the 
> adjacent test run may be making progress; i.e. the cast of surefire processes 
> may have changed between readings but our detector just sees presence of  
> hbase surefire processes.
> Here is example:
> {code}
> Suspicious java process found - waiting 30s to see if there are just slow to 
> stop
> There appear to be 5 zombie tests, they should have been killed by surefire 
> but survived
> 12823 surefirebooter852180186418035480.jar -enableassertions -Dhbase.test 
> -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom 
> -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true
> 7653 surefirebooter8579074445899448699.jar -enableassertions -Dhbase.test 
> -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom 
> -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true
> 12614 surefirebooter136529596936417090.jar -enableassertions -Dhbase.test 
> -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom 
> -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true
> 7836 surefirebooter3217047564606450448.jar -enableassertions -Dhbase.test 
> -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom 
> -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true
> 13566 surefirebooter2084039411151963494.jar -enableassertions -Dhbase.test 
> -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom 
> -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true
>  BEGIN zombies jstack extract
>  END  zombies jstack extract
> {code}
> 5 is the number of forked processes we allow when doing medium and large 
> tests so an adjacent build will always show as '5 zombies'.
> Need to add discerning if list of processes changes between readings.
> Can I also add a tag per build run that all forked processes pick up so I can 
> look at the current builds progeny only?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Bhupendra Kumar Jain (JIRA)
Bhupendra Kumar Jain created HBASE-14777:


 Summary: Replication fails with IndexOutOfBoundsException
 Key: HBASE-14777
 URL: https://issues.apache.org/jira/browse/HBASE-14777
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.0.0, 1.2.0, 1.3.0
Reporter: Bhupendra Kumar Jain
Assignee: Bhupendra Kumar Jain
Priority: Critical


Replication fails with IndexOutOfBoundsException 
{code}
regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
 threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.remove(Unknown Source)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
{code}

Its happening due to incorrect removal of entries from the replication entries 
list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Bhupendra Kumar Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993479#comment-14993479
 ] 

Bhupendra Kumar Jain commented on HBASE-14777:
--

The code removes the successful entries from the list of entries. Each removal 
from the list changes the position of subsequent element of list which results 
in IndexOutOfBoundsException
{code}
for (Future f : futures) {
  try {
// wait for all futures, remove successful parts
// (only the remaining parts will be retried)
entryLists.remove(f.get());
  } catch (InterruptedException ie) {
iox =  new IOException(ie);
  }
{code}

To handle this, We can iterate and remove in reverse order. 
{code}
 int fLen = futures.size();
for (int fIndex = fLen - 1; fIndex >= 0; fIndex--) {
  try {
// wait for all futures, remove successful parts
// (only the remaining parts will be retried)
entryLists.remove(futures.get(fIndex).get());
} catch (InterruptedException ie) {
iox =  new IOException(ie);
  }

{code}


> Replication fails with IndexOutOfBoundsException
> 
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Bhupendra Kumar Jain
>Priority: Critical
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>   at java.util.ArrayList.rangeCheck(Unknown Source)
>   at java.util.ArrayList.remove(Unknown Source)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache

2015-11-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993509#comment-14993509
 ] 

Anoop Sam John commented on HBASE-14463:


Can you attach patches for other branches as well. We need fix this in 0.98+ 
versions

> Severe performance downgrade when parallel reading a single key from 
> BucketCache
> 
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.17
>
> Attachments: GC_with_WeakObjectPool.png, HBASE-14463.patch, 
> HBASE-14463_v11.patch, HBASE-14463_v12.patch, HBASE-14463_v12.patch, 
> HBASE-14463_v2.patch, HBASE-14463_v3.patch, HBASE-14463_v4.patch, 
> HBASE-14463_v5.patch, TestBucketCache-new_with_IdLock.png, 
> TestBucketCache-new_with_IdReadWriteLock.png, 
> TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, 
> TestBucketCache_with_IdReadWriteLock-latest.png, 
> TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, 
> TestBucketCache_with_IdReadWriteLock.png, pe_use_same_keys.patch, 
> test-results.tar.gz
>
>
> We store feature data of online items in HBase, do machine learning on these 
> features, and supply the outputs to our online search engine. In such 
> scenario we will launch hundreds of yarn workers and each worker will read 
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy 
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc 
> issue, and just as titled we have observed severe performance downgrade. 
> After some analytics we found the root cause is the lock in 
> BucketCache#getBlock, as shown below
> {code}
>   try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
>   // ...
>   int len = bucketEntry.getLength();
>   Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
>   bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the 
> operation in LruCache. And since we're using synchronized in 
> IdLock#getLockEntry, parallel read dropping on the same bucket would be 
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in 
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-11-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993510#comment-14993510
 ] 

Anoop Sam John commented on HBASE-14355:


No problem..  Will commit it then. Thanks.

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v2.patch, 
> HBASE-14355-v3.patch, HBASE-14355-v4.patch, HBASE-14355-v5.patch, 
> HBASE-14355-v6.patch, HBASE-14355-v7.patch, HBASE-14355-v8.patch, 
> HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14769) Remove unused functions and duplicate javadocs from HBaseAdmin

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993538#comment-14993538
 ] 

Hadoop QA commented on HBASE-14769:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12770980/HBASE-14769-master-v4.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12770980

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 33 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1729 checkstyle errors (more than the master's current 1726 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16427//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16427//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16427//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16427//console

This message is automatically generated.

> Remove unused functions and duplicate javadocs from HBaseAdmin 
> ---
>
> Key: HBASE-14769
> URL: https://issues.apache.org/jira/browse/HBASE-14769
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-14769-master-v2.patch, 
> HBASE-14769-master-v3.patch, HBASE-14769-master-v4.patch, 
> HBASE-14769-master.patch
>
>
> HBaseAdmin is marked private, so removing the functions not being used 
> anywhere.
> Also, the javadocs of overridden functions are same as corresponding ones in 
> Admin.java. Since javadocs are automatically inherited from the interface 
> class, we can remove these redundant 100s of lines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14427) Fix 'should' assertions in TestFastFail

2015-11-06 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993543#comment-14993543
 ] 

Abhishek Singh Chouhan commented on HBASE-14427:


Since the fix already went in 
(https://github.com/apache/hbase/commit/b3afdb8de1a9fa88c553159b2d2d2aa96902a345)
 , only pushing the addendum to reenable it should do i guess. If a patch 
adds/re-enables a test, does the QA run the added tests also for the same 
patch? If yes then we can probably get QA runs and see how the test is faring.

> Fix 'should' assertions in TestFastFail
> ---
>
> Key: HBASE-14427
> URL: https://issues.apache.org/jira/browse/HBASE-14427
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
>  Labels: beginner
> Fix For: 2.0.0
>
> Attachments: HBASE-14427.addendum.patch, HBASE-14427.patch
>
>
> Over in HBASE-14421, TestFastFail has been failing assertions that talk of 
> events that 'should' be happening. Fix. For now HBASE-14421 has disabled the 
> 'should' assertions. They seem fine on apache jenkins build but fail fairly 
> reliably for me on alternate HW.
> To address, get familiar with the test. Change the commented out asserts to 
> be yes/no instead of a 'likely' (On a cursory scan, it is possible that a 
> test run may not involve preemption and it is these runs that are throwing 
> asserts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14767) Remove deprecated functions from HBaseAdmin

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993569#comment-14993569
 ] 

Hadoop QA commented on HBASE-14767:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12770979/HBASE-14767-master-v4.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12770979

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 112 
new or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16426//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16426//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16426//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16426//console

This message is automatically generated.

> Remove deprecated functions from HBaseAdmin
> ---
>
> Key: HBASE-14767
> URL: https://issues.apache.org/jira/browse/HBASE-14767
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-14767-master-v2.patch, 
> HBASE-14767-master-v3.patch, HBASE-14767-master-v4.patch, 
> HBASE-14767-master.patch
>
>
> Many functions in HBaseAdmin are marked deprecated. Removing them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-11-06 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13082:
---
Attachment: HBASE-13082_2.pdf

Attaching an updated PDF that removes the point on the secondary replicas and 
the cleaner archiving the primary replicas compacted files.

> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: ramkrishna.s.vasudevan
> Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
> 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, 
> HBASE-13082_1_WIP.patch, HBASE-13082_2.pdf, HBASE-13082_2_WIP.patch, 
> HBASE-13082_3.patch, HBASE-13082_4.patch, HBASE-13082_9.patch, 
> HBASE-13082_9.patch, gc.png, gc.png, gc.png, hits.png, next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-11-06 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13082:
---
Attachment: HBASE-13082_12.patch

Updated patch for QA.  This has all the mentioned things in the doc. One 
feedback not yet done here is that the fileStatus  - DISCARDED and ACTIVE is 
set with the REader in the StoreFile. This is because the inner class Reader in 
store file is designed in such a way that even if we want the scanner 
associated with this store file we need to operate on this Reader obejct rather 
than store file. So it is better the file status remains there I thought. Any 
way feed back welcome!!. will post the patch in RB too.
With PE tool and with 10G data and all in cache i could see around 70 to 90 
secs difference in completion time on an average. (purely measuring the server 
side gain).
{code}
./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred --oneCon=true  
--caching=5000 --filterAll=true --rows=1  scanRange1 50
{code}



> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: ramkrishna.s.vasudevan
> Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
> 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, 
> HBASE-13082_12.patch, HBASE-13082_1_WIP.patch, HBASE-13082_2.pdf, 
> HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, 
> HBASE-13082_9.patch, HBASE-13082_9.patch, gc.png, gc.png, gc.png, hits.png, 
> next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-11-06 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13082:
---
Status: Patch Available  (was: Open)

> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: ramkrishna.s.vasudevan
> Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
> 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, 
> HBASE-13082_12.patch, HBASE-13082_1_WIP.patch, HBASE-13082_2.pdf, 
> HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, 
> HBASE-13082_9.patch, HBASE-13082_9.patch, gc.png, gc.png, gc.png, hits.png, 
> next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Bhupendra Kumar Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhupendra Kumar Jain updated HBASE-14777:
-
Attachment: HBASE-14777.patch

Please review the attached patch

> Replication fails with IndexOutOfBoundsException
> 
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Bhupendra Kumar Jain
>Priority: Critical
> Attachments: HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>   at java.util.ArrayList.rangeCheck(Unknown Source)
>   at java.util.ArrayList.remove(Unknown Source)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Bhupendra Kumar Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhupendra Kumar Jain updated HBASE-14777:
-
Status: Patch Available  (was: Open)

> Replication fails with IndexOutOfBoundsException
> 
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Bhupendra Kumar Jain
>Priority: Critical
> Attachments: HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>   at java.util.ArrayList.rangeCheck(Unknown Source)
>   at java.util.ArrayList.remove(Unknown Source)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14749) Make changes to region_mover.rb to use RegionMover Java tool

2015-11-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-14749:
---
Attachment: HBASE-14749.patch

> Make changes to region_mover.rb to use RegionMover Java tool
> 
>
> Key: HBASE-14749
> URL: https://issues.apache.org/jira/browse/HBASE-14749
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Attachments: HBASE-14749.patch
>
>
> With HBASE-13014 in, we can now replace the ruby script such that it invokes 
> the Java Tool. Also expose timeout and no-ack mode which were added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14706) RegionLocationFinder should return multiple servernames by top host

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993636#comment-14993636
 ] 

Hadoop QA commented on HBASE-14706:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12770751/HBASE-14706-branch-1.1.patch
  against branch-1.1 branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12770751

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16428//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16428//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16428//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16428//console

This message is automatically generated.

> RegionLocationFinder should return multiple servernames by top host
> ---
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14706-branch-1.1.patch, 
> HBASE-14706-trunk_v1.patch, HBASE-14706-trunk_v2.patch, 
> HBASE-14706-trunk_v3.patch, HBASE-14706-trunk_v4.patch, HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14749) Make changes to region_mover.rb to use RegionMover Java tool

2015-11-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-14749:
---
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

Getting a QA

> Make changes to region_mover.rb to use RegionMover Java tool
> 
>
> Key: HBASE-14749
> URL: https://issues.apache.org/jira/browse/HBASE-14749
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0
>
> Attachments: HBASE-14749.patch
>
>
> With HBASE-13014 in, we can now replace the ruby script such that it invokes 
> the Java Tool. Also expose timeout and no-ack mode which were added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993688#comment-14993688
 ] 

Hadoop QA commented on HBASE-13082:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771006/HBASE-13082_12.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771006

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 46 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1728 checkstyle errors (more than the master's current 1726 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16429//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16429//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16429//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16429//console

This message is automatically generated.

> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: ramkrishna.s.vasudevan
> Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
> 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, 
> HBASE-13082_12.patch, HBASE-13082_1_WIP.patch, HBASE-13082_2.pdf, 
> HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, 
> HBASE-13082_9.patch, HBASE-13082_9.patch, gc.png, gc.png, gc.png, hits.png, 
> next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14771) RpcServer.getRemoteAddress always returns null.

2015-11-06 Thread Abhishek Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar updated HBASE-14771:
---
Attachment: HBASE-14771-V1.patch

Test case added, pls review the same.

> RpcServer.getRemoteAddress always returns null.
> ---
>
> Key: HBASE-14771
> URL: https://issues.apache.org/jira/browse/HBASE-14771
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0
>Reporter: Abhishek Kumar
>Assignee: Abhishek Kumar
>Priority: Minor
> Attachments: HBASE-14771-V1.patch, HBASE-14771.patch
>
>
> RpcServer.getRemoteAddress always returns null, because Call object is 
> getting initialized with null.This seems to be happening because of using 
> RpcServer.getRemoteIp() in  Call object constructor before RpcServer thread 
> local 'CurCall' being set in CallRunner.run method:
> {noformat}
> // --- RpcServer.java ---
> protected void processRequest(byte[] buf) throws IOException, 
> InterruptedException {
>  .
> // Call object getting initialized here with address 
> // obtained from RpcServer.getRemoteIp()
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, RpcServer.getRemoteIp());
>   scheduler.dispatch(new CallRunner(RpcServer.this, call));
>  }
> // getRemoteIp method gets address from threadlocal 'CurCall' which 
> // gets set in CallRunner.run and calling it before this as in above case, 
> will return null
> // --- CallRunner.java ---
> public void run() {
>   .   
>   Pair resultPair = null;
>   RpcServer.CurCall.set(call);
>   ..
> }
> // Using 'this.addr' in place of getRemoteIp method in RpcServer.java seems 
> to be fixing this issue
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, this.addr);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14706) RegionLocationFinder should return multiple servernames by top host

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14706:
---
   Resolution: Fixed
Fix Version/s: 1.1.3
   Status: Resolved  (was: Patch Available)

> RegionLocationFinder should return multiple servernames by top host
> ---
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14706-branch-1.1.patch, 
> HBASE-14706-trunk_v1.patch, HBASE-14706-trunk_v2.patch, 
> HBASE-14706-trunk_v3.patch, HBASE-14706-trunk_v4.patch, HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993728#comment-14993728
 ] 

Ted Yu commented on HBASE-14777:


+1 if tests pass.

> Replication fails with IndexOutOfBoundsException
> 
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Bhupendra Kumar Jain
>Priority: Critical
> Attachments: HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>   at java.util.ArrayList.rangeCheck(Unknown Source)
>   at java.util.ArrayList.remove(Unknown Source)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993795#comment-14993795
 ] 

Hadoop QA commented on HBASE-14777:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771008/HBASE-14777.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771008

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16430//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16430//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16430//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16430//console

This message is automatically generated.

> Replication fails with IndexOutOfBoundsException
> 
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Bhupendra Kumar Jain
>Priority: Critical
> Attachments: HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>   at java.util.ArrayList.rangeCheck(Unknown Source)
>   at java.util.ArrayList.remove(Unknown Source)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache

2015-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993831#comment-14993831
 ] 

Ted Yu commented on HBASE-14463:


HBASE-14268 went into 1.2+

For 0.98, we need to backport HBASE-14268 first

> Severe performance downgrade when parallel reading a single key from 
> BucketCache
> 
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.17
>
> Attachments: GC_with_WeakObjectPool.png, HBASE-14463.patch, 
> HBASE-14463_v11.patch, HBASE-14463_v12.patch, HBASE-14463_v12.patch, 
> HBASE-14463_v2.patch, HBASE-14463_v3.patch, HBASE-14463_v4.patch, 
> HBASE-14463_v5.patch, TestBucketCache-new_with_IdLock.png, 
> TestBucketCache-new_with_IdReadWriteLock.png, 
> TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, 
> TestBucketCache_with_IdReadWriteLock-latest.png, 
> TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, 
> TestBucketCache_with_IdReadWriteLock.png, pe_use_same_keys.patch, 
> test-results.tar.gz
>
>
> We store feature data of online items in HBase, do machine learning on these 
> features, and supply the outputs to our online search engine. In such 
> scenario we will launch hundreds of yarn workers and each worker will read 
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy 
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc 
> issue, and just as titled we have observed severe performance downgrade. 
> After some analytics we found the root cause is the lock in 
> BucketCache#getBlock, as shown below
> {code}
>   try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
>   // ...
>   int len = bucketEntry.getLength();
>   Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
>   bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the 
> operation in LruCache. And since we're using synchronized in 
> IdLock#getLockEntry, parallel read dropping on the same bucket would be 
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in 
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993857#comment-14993857
 ] 

Ted Yu commented on HBASE-14632:


Ping [~eclark]

> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long getTotalStaticIndexSize() {
> long size = 0;
> for (StoreFile s : 
> this.storeEngine.getStoreFileManager().getStorefiles()) {
>   size += s.getReader().getUncompressedDataIndexSize();
> }
> return size;
>   }
> {code}
> Some methods, such as getStorefilesIndexSize(), guard against null Reader by 
> checking r against null.
>

[jira] [Commented] (HBASE-14749) Make changes to region_mover.rb to use RegionMover Java tool

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993860#comment-14993860
 ] 

Hadoop QA commented on HBASE-14749:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771011/HBASE-14749.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771011

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1727 checkstyle errors (more than the master's current 1726 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  echo "Usage: graceful_stop.sh [--config ] [-e] [--restart 
[--reload]] [--thrift] [--rest] "
+  echo " n|noackEnable noAck mode in RegionMover. This is a best 
effort mode for moving regions"
+  echo " movetimeout xx Timeout for moving regions. If regions are not moved 
by the timeout value, exit with error. Default value is INT_MAX."
+HBASE_NOEXEC=true "$bin"/hbase --config ${HBASE_CONF_DIR} 
org.apache.hadoop.hbase.util.RegionMover --filename $filename --maxthreads 
$maxthreads $noack --operation "unload" --timeout $movetimeout 
--regionserverhost $hostname
+HBASE_NOEXEC=true "$bin"/hbase --config ${HBASE_CONF_DIR} 
org.apache.hadoop.hbase.util.RegionMover --filename $filename --maxthreads 
$maxthreads $noack --operation "load" --timeout $movetimeout --regionserverhost 
$hostname
+usage_str="Usage: `basename $0` [--config ] [--rs-only] 
[--master-only] [--graceful [--maxthreads xx] [--noack] [--movetimeout]]"
+"$bin"/graceful_stop.sh --config ${HBASE_CONF_DIR} --restart --reload 
--maxthreads ${RR_MAXTHREADS} ${RR_NOACK} --movetimeout ${RR_MOVE_TIMEOUT} 
$hostname
+  + "RegionServer, hence best effort. This is more performant in unloading 
and loading but might "

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

  {color:red}-1 core zombie tests{color}.  There are possible 5 zombie 
test(s): at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:428)
at 
org.apache.hadoop.hbase.regionserver.wal.TestWALReplay.testReplayEditsWrittenIntoWAL(TestWALReplay.java:810)
at 
org.apache.hadoop.hbase.regionserver.TestRemoveRegionMetrics.testMoveRegion(TestRemoveRegionMetrics.java:119)
at 
org.apache.hadoop.hbase.regionserver.TestCompoundBloomFilter.testCompoundBloomFilter(TestCompoundBloomFilter.java:161)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16431//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16431//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16431//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16431//console

This message is automatically generated.

> Make changes to region_mover.rb to use RegionMover Java tool
> 
>
> Key: HBASE-14749
> URL: https://issues.apache.org/jira/browse/HBASE-14749
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0
>
> Attachments: HBASE-14749.patch
>
>
> With HBASE-13014 in, we can now replace the ruby script such that it invokes 
> the Java Tool. Also expose timeout and no-ack mode which were added.



--
This message was sent by Atl

[jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993988#comment-14993988
 ] 

Ashish Singhi commented on HBASE-14777:
---

+1 (non-binding)

> Replication fails with IndexOutOfBoundsException
> 
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Bhupendra Kumar Jain
>Priority: Critical
> Attachments: HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>   at java.util.ArrayList.rangeCheck(Unknown Source)
>   at java.util.ArrayList.remove(Unknown Source)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2015-11-06 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993993#comment-14993993
 ] 

Samir Ahmic commented on HBASE-14223:
-

Thanks for explanation [~enis]. I will keep digging around this issue.

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
>  to 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.open

[jira] [Commented] (HBASE-14771) RpcServer.getRemoteAddress always returns null.

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993995#comment-14993995
 ] 

Hadoop QA commented on HBASE-14771:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771025/HBASE-14771-V1.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771025

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16432//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16432//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16432//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16432//console

This message is automatically generated.

> RpcServer.getRemoteAddress always returns null.
> ---
>
> Key: HBASE-14771
> URL: https://issues.apache.org/jira/browse/HBASE-14771
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0
>Reporter: Abhishek Kumar
>Assignee: Abhishek Kumar
>Priority: Minor
> Attachments: HBASE-14771-V1.patch, HBASE-14771.patch
>
>
> RpcServer.getRemoteAddress always returns null, because Call object is 
> getting initialized with null.This seems to be happening because of using 
> RpcServer.getRemoteIp() in  Call object constructor before RpcServer thread 
> local 'CurCall' being set in CallRunner.run method:
> {noformat}
> // --- RpcServer.java ---
> protected void processRequest(byte[] buf) throws IOException, 
> InterruptedException {
>  .
> // Call object getting initialized here with address 
> // obtained from RpcServer.getRemoteIp()
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, RpcServer.getRemoteIp());
>   scheduler.dispatch(new CallRunner(RpcServer.this, call));
>  }
> // getRemoteIp method gets address from threadlocal 'CurCall' which 
> // gets set in CallRunner.run and calling it before this as in above case, 
> will return null
> // --- CallRunner.java ---
> public void run() {
>   .   
>   Pair resultPair = null;
>   RpcServer.CurCall.set(call);
>   ..
> }
> // Using 'this.addr' in place of getRemoteIp method in RpcServer.java seems 
> to be fixing this issue
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, this.addr);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-06 Thread Ashish Singhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-13153:
--
Status: Patch Available  (was: Open)

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v2.patch, 
> HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, 
> HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, 
> HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load 
> Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load 
> Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-06 Thread Ashish Singhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-13153:
--
Attachment: HBASE-13153-v12.patch

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v2.patch, 
> HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, 
> HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, 
> HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load 
> Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load 
> Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-06 Thread Ashish Singhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-13153:
--
Attachment: HBase Bulk Load Replication-v3.pdf

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v2.patch, 
> HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, 
> HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, 
> HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load 
> Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load 
> Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-06 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994071#comment-14994071
 ] 

Ashish Singhi commented on HBASE-13153:
---

Attached updated design doc and patch.
Please review.

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v2.patch, 
> HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, 
> HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, 
> HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load 
> Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load 
> Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14605) Split fails due to 'No valid credentials' error when SecureBulkLoadEndpoint#start tries to access hdfs

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14605:
---
Release Note: 
When split is requested by non-super user, split related notifications for 
Coprocessor are executed using the login of the request user.
Previously the notifications were carried out as super user.

> Split fails due to 'No valid credentials' error when 
> SecureBulkLoadEndpoint#start tries to access hdfs
> --
>
> Key: HBASE-14605
> URL: https://issues.apache.org/jira/browse/HBASE-14605
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 144605-branch-1-v3.txt, 14605-0.98-v5.txt, 
> 14605-branch-1-addendum.txt, 14605-branch-1-v4.txt, 14605-branch-1-v5.txt, 
> 14605-branch-1.0-v5.txt, 14605-v1.txt, 14605-v2.txt, 14605-v3.txt, 
> 14605-v3.txt, 14605-v3.txt, 14605-v4.txt, 14605-v5.txt, 14605.alt
>
>
> During recent testing in secure cluster (with HBASE-14475), we found the 
> following when user X (non-super user) split a table with region replica:
> {code}
> 2015-10-12 10:58:18,955 ERROR [FifoRpcScheduler.handler1-thread-9] 
> master.HMaster: Region server hbase-4-4.novalocal,60020,1444645588137 
> reported a fatal error:
> ABORTING region server hbase-4-4.novalocal,60020,1444645588137: The 
> coprocessor org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint 
> threw an unexpected   exception
> Cause:
> java.lang.IllegalStateException: Failed to get FileSystem instance
>   at 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint.start(SecureBulkLoadEndpoint.java:148)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$Environment.startup(CoprocessorHost.java:415)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadInstance(CoprocessorHost.java:257)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadSystemCoprocessors(CoprocessorHost.java:160)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.(RegionCoprocessorHost.java:192)
>   at org.apache.hadoop.hbase.regionserver.HRegion.(HRegion.java:701)
>   at org.apache.hadoop.hbase.regionserver.HRegion.(HRegion.java:608)
> ...
> Caused by: java.io.IOException: Failed on local exception: 
> java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid  credentials provided (Mechanism 
> level: Failed to find any Kerberos tgt)]; Host Details : local host is: 
> "hbase-4-4/172.22.66.186"; destination host is: "os-r6-  
> okarus-hbase-4-2.novalocal":8020;
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1473)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy18.mkdirs(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:555)
>   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy19.mkdirs(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2775)
>   at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2746)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:967)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:963)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> {code}
> The cause was that SecureBulkLoadEndpoint#start tried to create staging dir 
> in hdfs as user X but didn't pass authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14778) Make block cache hit percentages not integer in the metrics system

2015-11-06 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-14778:
-

 Summary: Make block cache hit percentages not integer in the 
metrics system
 Key: HBASE-14778
 URL: https://issues.apache.org/jira/browse/HBASE-14778
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark


Once you're close to the 90%+ it's hard to see a difference because getting a 
full percent change is rare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14717) Enable_table_replication should not create table in peer cluster if specified few tables added in peer

2015-11-06 Thread Ashish Singhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-14717:
--
Attachment: HBASE-14717(1).patch

Retry...

> Enable_table_replication should not create table in peer cluster if specified 
> few tables added in peer
> --
>
> Key: HBASE-14717
> URL: https://issues.apache.org/jira/browse/HBASE-14717
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.0.2
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Ashish Singhi
> Attachments: HBASE-14717(1).patch, HBASE-14717.patch
>
>
> For a peer only user specified tables should be created but 
> enable_table_replication command is not honouring that.
> eg:
> like peer1 : t1:cf1, t2
> create 't3', 'd'
> enable_table_replication 't3' > should not create t3 in peer1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14766) WALEntryFilter's filter implement, cell.getFamily() needs to be replaced with the new low-cost implementation.

2015-11-06 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994106#comment-14994106
 ] 

huaxiang sun commented on HBASE-14766:
--

The unittest is covered by the existing testing case 
TestPerTableCFReplication.testPerTableCFReplication(). The unittest failure is 
not related with this change as HBASE-14766-v002.patch passed the unittest. The 
difference between v002 and v003 is reformatting the code.

> WALEntryFilter's filter implement, cell.getFamily() needs to be replaced with 
> the new low-cost implementation.
> --
>
> Key: HBASE-14766
> URL: https://issues.apache.org/jira/browse/HBASE-14766
> Project: HBase
>  Issue Type: Improvement
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-14766-v001.patch, HBASE-14766-v002.patch, 
> HBASE-14766-v003.patch
>
>
> Cell's getFamily() gets an array copy of the cell's family, while in the 
> filter function,  it just needs to peek into the family and do a compare. 
> Replace 
> Bytes.toString(cell.getFamily())
> with 
> Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), 
> cell.getFamilyLength())



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14778) Make block cache hit percentages not integer in the metrics system

2015-11-06 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14778:
--
Attachment: HBASE-14778.patch

> Make block cache hit percentages not integer in the metrics system
> --
>
> Key: HBASE-14778
> URL: https://issues.apache.org/jira/browse/HBASE-14778
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-14778.patch
>
>
> Once you're close to the 90%+ it's hard to see a difference because getting a 
> full percent change is rare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994144#comment-14994144
 ] 

Elliott Clark commented on HBASE-14632:
---

+1 since existing methods use the same.

> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long getTotalStaticIndexSize() {
> long size = 0;
> for (StoreFile s : 
> this.storeEngine.getStoreFileManager().getStorefiles()) {
>   size += s.getReader().getUncompressedDataIndexSize();
> }
> return size;
>   }
> {code}
> Some methods, such as getStorefilesIndexSize(), guard against null 

[jira] [Updated] (HBASE-14778) Make block cache hit percentages not integer in the metrics system

2015-11-06 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14778:
--
Fix Version/s: 1.3.0
   1.2.0
   2.0.0
Affects Version/s: 1.2.0
   1.1.2
   Status: Patch Available  (was: Open)

> Make block cache hit percentages not integer in the metrics system
> --
>
> Key: HBASE-14778
> URL: https://issues.apache.org/jira/browse/HBASE-14778
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2, 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14778.patch
>
>
> Once you're close to the 90%+ it's hard to see a difference because getting a 
> full percent change is rare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14632:
---
Fix Version/s: 1.3.0
   1.2.0
   2.0.0

> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long getTotalStaticIndexSize() {
> long size = 0;
> for (StoreFile s : 
> this.storeEngine.getStoreFileManager().getStorefiles()) {
>   size += s.getReader().getUncompressedDataIndexSize();
> }
> return size;
>   }
> {code}
> Some methods, such as getStorefilesIndexSize(), guard again

[jira] [Commented] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994167#comment-14994167
 ] 

Ted Yu commented on HBASE-14632:


Thanks, Elliott.

Will integrate later today if there is no more review comment.

> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long getTotalStaticIndexSize() {
> long size = 0;
> for (StoreFile s : 
> this.storeEngine.getStoreFileManager().getStorefiles()) {
>   size += s.getReader().getUncompressedDataIndexSize();
> }
> return size;
>   }
> {cod

[jira] [Created] (HBASE-14779) Revamp IntegrationTestMTTR

2015-11-06 Thread Jonathan Hsieh (JIRA)
Jonathan Hsieh created HBASE-14779:
--

 Summary: Revamp IntegrationTestMTTR
 Key: HBASE-14779
 URL: https://issues.apache.org/jira/browse/HBASE-14779
 Project: HBase
  Issue Type: Improvement
  Components: integration tests
Affects Versions: 2.0.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh


I've recently been trying to revive IntegrationTestMTTR runs and found that it 
tended to not complete in less 6 hours and wasn't written as many of the other 
Integration Tests.

I'm going to revamp it a local it run of it can finish in < 30mins and to make 
it more configurable for a run against  a real cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14779) Revamp IntegrationTestMTTR

2015-11-06 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-14779:
---
Status: Patch Available  (was: Open)

> Revamp IntegrationTestMTTR
> --
>
> Key: HBASE-14779
> URL: https://issues.apache.org/jira/browse/HBASE-14779
> Project: HBase
>  Issue Type: Improvement
>  Components: integration tests
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-14779.patch
>
>
> I've recently been trying to revive IntegrationTestMTTR runs and found that 
> it tended to not complete in less 6 hours and wasn't written as many of the 
> other Integration Tests.
> I'm going to revamp it a local it run of it can finish in < 30mins and to 
> make it more configurable for a run against  a real cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14779) Revamp IntegrationTestMTTR

2015-11-06 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-14779:
---
Attachment: hbase-14779.patch

v1 to see if it passes checkstyle and such.

Currently in the process of testing command line stuff on a real cluster.

> Revamp IntegrationTestMTTR
> --
>
> Key: HBASE-14779
> URL: https://issues.apache.org/jira/browse/HBASE-14779
> Project: HBase
>  Issue Type: Improvement
>  Components: integration tests
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-14779.patch
>
>
> I've recently been trying to revive IntegrationTestMTTR runs and found that 
> it tended to not complete in less 6 hours and wasn't written as many of the 
> other Integration Tests.
> I'm going to revamp it a local it run of it can finish in < 30mins and to 
> make it more configurable for a run against  a real cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14780) Integration Tests that run with ChaosMonkey need to specify CFs

2015-11-06 Thread Jonathan Hsieh (JIRA)
Jonathan Hsieh created HBASE-14780:
--

 Summary: Integration Tests that run with ChaosMonkey need to 
specify CFs
 Key: HBASE-14780
 URL: https://issues.apache.org/jira/browse/HBASE-14780
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh


Been running some IT tests and found that some failed because getcfs was null 
and didn't  protecte cfs that were assumed to go unmolested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-14780) Integration Tests that run with ChaosMonkey need to specify CFs

2015-11-06 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-14780:
--

Assignee: Jonathan Hsieh

> Integration Tests that run with ChaosMonkey need to specify CFs
> ---
>
> Key: HBASE-14780
> URL: https://issues.apache.org/jira/browse/HBASE-14780
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-14780.patch
>
>
> Been running some IT tests and found that some failed because getcfs was null 
> and didn't  protecte cfs that were assumed to go unmolested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14780) Integration Tests that run with ChaosMonkey need to specify CFs

2015-11-06 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-14780:
---
Affects Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> Integration Tests that run with ChaosMonkey need to specify CFs
> ---
>
> Key: HBASE-14780
> URL: https://issues.apache.org/jira/browse/HBASE-14780
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-14780.patch
>
>
> Been running some IT tests and found that some failed because getcfs was null 
> and didn't  protecte cfs that were assumed to go unmolested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14780) Integration Tests that run with ChaosMonkey need to specify CFs

2015-11-06 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-14780:
---
Attachment: hbase-14780.patch

> Integration Tests that run with ChaosMonkey need to specify CFs
> ---
>
> Key: HBASE-14780
> URL: https://issues.apache.org/jira/browse/HBASE-14780
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
> Attachments: hbase-14780.patch
>
>
> Been running some IT tests and found that some failed because getcfs was null 
> and didn't  protecte cfs that were assumed to go unmolested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14781) Turn per cf flushing on for ITBLL by default

2015-11-06 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-14781:
-

 Summary: Turn per cf flushing on for ITBLL by default
 Key: HBASE-14781
 URL: https://issues.apache.org/jira/browse/HBASE-14781
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14632:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long getTotalStaticIndexSize() {
> long size = 0;
> for (StoreFile s : 
> this.storeEngine.getStoreFileManager().getStorefiles()) {
>   size += s.getReader().getUncompressedDataIndexSize();
> }
> return size;
>   }
> {code}
> Some methods, such as getStorefi

[jira] [Updated] (HBASE-14781) Turn per cf flushing on for ITBLL by default

2015-11-06 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14781:
--
Attachment: HBASE-14781.patch

Turn the per cf flushing on by default.
Make it use configuration.
Make everything consistent.

> Turn per cf flushing on for ITBLL by default
> 
>
> Key: HBASE-14781
> URL: https://issues.apache.org/jira/browse/HBASE-14781
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-14781.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994373#comment-14994373
 ] 

Ted Yu commented on HBASE-13153:


Left a few comments on reviewboard.

Can you illustrate the cluster tests you have performed ?

1. secure bulk loading (without replication)
2. bulk loaded hfiles replicated across secure clusters
3. 2. bulk loaded hfiles replicated across secure HA clusters

Please try to add more unit tests for the 3 points you mentioned on Oct 28th

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v2.patch, 
> HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, 
> HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, 
> HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load 
> Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load 
> Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994388#comment-14994388
 ] 

Hadoop QA commented on HBASE-13153:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771067/HBASE-13153-v12.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771067

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1728 checkstyle errors (more than the master's current 1726 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  new java.lang.String[] { "Entry", "ReplicationClusterId", 
"SourceBaseNamespaceDirPath", "SourceHFileArchiveDirPath", });
+  private void validateFamiliesInHFiles(Table table, Deque 
queue) throws IOException {
++ ". Hence will load all the xml files present in its 
configured replication cluster"

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16433//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16433//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16433//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16433//console

This message is automatically generated.

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v2.patch, 
> HBASE-13153-v3.patch, HBASE-13153-v4.patch, HBASE-13153-v5.patch, 
> HBASE-13153-v6.patch, HBASE-13153-v7.patch, HBASE-13153-v8.patch, 
> HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk Load 
> Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk Load 
> Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14781) Turn per cf flushing on for ITBLL by default

2015-11-06 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14781:
--
Fix Version/s: 1.3.0
   2.0.0
Affects Version/s: 1.3.0
   1.2.0
   2.0.0
   Status: Patch Available  (was: Open)

> Turn per cf flushing on for ITBLL by default
> 
>
> Key: HBASE-14781
> URL: https://issues.apache.org/jira/browse/HBASE-14781
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14781.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14782) FuzzyRowFilter skips valid rows

2015-11-06 Thread Vladimir Rodionov (JIRA)
Vladimir Rodionov created HBASE-14782:
-

 Summary: FuzzyRowFilter skips valid rows
 Key: HBASE-14782
 URL: https://issues.apache.org/jira/browse/HBASE-14782
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov


The issue may affect not only master branch, but previous releases as well.
This is from one of our customers:
{quote}
We are experiencing a problem with the FuzzyRowFilter for HBase scan. We think 
that it is a bug. 
Fuzzy filter should pick a row if it matches filter criteria irrespective of 
other rows present in table but filter is dropping a row depending on some 
other row present in table. 


Details/Step to reproduce/Sample outputs below: 

Missing row key: \x9C\x00\x044\x00\x00\x00\x00 
Causing row key: \x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX 


Prerequisites 
1. Create a test table. HBase shell command -- create 'fuzzytest','d' 
2. Insert some test data. HBase shell commands: 
• put 'fuzzytest',"\x9C\x00\x044\x00\x00\x00\x00",'d:a','junk' 
• put 'fuzzytest',"\x9C\x00\x044\x01\x00\x00\x00",'d:a','junk' 
• put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x00",'d:a','junk' 
• put 'fuzzytest',"\x9C\x00\x044\x00\x00\x01\x00",'d:a','junk' 
• put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x01",'d:a','junk' 
• put 'fuzzytest',"\x9B\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
• put 'fuzzytest',"\x9D\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
Now when you run the code, you will find \x9C\x00\x044\x00\x00\x00\x00 in 
output because it matches filter criteria. (Refer how to run code below) 
Insert the row key causing bug: 
HBase shell command: put 
'fuzzytest',"\x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX",'d:a','junk' 
Now when you run the code, you will not find \x9C\x00\x044\x00\x00\x00\x00 in 
output even though it still matches filter criteria. 
{quote}

Verified the issue on master.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14771) RpcServer.getRemoteAddress always returns null.

2015-11-06 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994418#comment-14994418
 ] 

Appy commented on HBASE-14771:
--

Thanks [~a72877] for adding the test.
So we check that's it's not null, but am a bit confused here, what is the real 
value of address here since "AbstractRpcClient client = createRpcClient(conf);" 
creates client with localAddr as null.

Also, tests which do not change functionality of non-test code (FifoScheduler 
here) are better ones. The added test is only checking that Call.remoteAddress 
is set correctly, but if we check that RpcServer.getRemoteAddress works 
correctly, it'll cover the former case and also will be a more robust test of 
functionality.
So how about trying this..change ping method to return value of 
Rpc.getRemoteAddress and check it's equal to some expected address.

> RpcServer.getRemoteAddress always returns null.
> ---
>
> Key: HBASE-14771
> URL: https://issues.apache.org/jira/browse/HBASE-14771
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0
>Reporter: Abhishek Kumar
>Assignee: Abhishek Kumar
>Priority: Minor
> Attachments: HBASE-14771-V1.patch, HBASE-14771.patch
>
>
> RpcServer.getRemoteAddress always returns null, because Call object is 
> getting initialized with null.This seems to be happening because of using 
> RpcServer.getRemoteIp() in  Call object constructor before RpcServer thread 
> local 'CurCall' being set in CallRunner.run method:
> {noformat}
> // --- RpcServer.java ---
> protected void processRequest(byte[] buf) throws IOException, 
> InterruptedException {
>  .
> // Call object getting initialized here with address 
> // obtained from RpcServer.getRemoteIp()
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, RpcServer.getRemoteIp());
>   scheduler.dispatch(new CallRunner(RpcServer.this, call));
>  }
> // getRemoteIp method gets address from threadlocal 'CurCall' which 
> // gets set in CallRunner.run and calling it before this as in above case, 
> will return null
> // --- CallRunner.java ---
> public void run() {
>   .   
>   Pair resultPair = null;
>   RpcServer.CurCall.set(call);
>   ..
> }
> // Using 'this.addr' in place of getRemoteIp method in RpcServer.java seems 
> to be fixing this issue
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, this.addr);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14781) Turn per cf flushing on for ITBLL by default

2015-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994433#comment-14994433
 ] 

stack commented on HBASE-14781:
---

Will it do anything since we fill one cf only?

> Turn per cf flushing on for ITBLL by default
> 
>
> Key: HBASE-14781
> URL: https://issues.apache.org/jira/browse/HBASE-14781
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14781.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14781) Turn per cf flushing on for ITBLL by default

2015-11-06 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994441#comment-14994441
 ] 

Elliott Clark commented on HBASE-14781:
---

We will actually fill the small and large columns now. Before this the check 
wasn't consistent so it wasn't always working.

> Turn per cf flushing on for ITBLL by default
> 
>
> Key: HBASE-14781
> URL: https://issues.apache.org/jira/browse/HBASE-14781
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14781.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14463:
---
Attachment: 14463-branch-1-v12.txt

> Severe performance downgrade when parallel reading a single key from 
> BucketCache
> 
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.17
>
> Attachments: 14463-branch-1-v12.txt, GC_with_WeakObjectPool.png, 
> HBASE-14463.patch, HBASE-14463_v11.patch, HBASE-14463_v12.patch, 
> HBASE-14463_v12.patch, HBASE-14463_v2.patch, HBASE-14463_v3.patch, 
> HBASE-14463_v4.patch, HBASE-14463_v5.patch, 
> TestBucketCache-new_with_IdLock.png, 
> TestBucketCache-new_with_IdReadWriteLock.png, 
> TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, 
> TestBucketCache_with_IdReadWriteLock-latest.png, 
> TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, 
> TestBucketCache_with_IdReadWriteLock.png, pe_use_same_keys.patch, 
> test-results.tar.gz
>
>
> We store feature data of online items in HBase, do machine learning on these 
> features, and supply the outputs to our online search engine. In such 
> scenario we will launch hundreds of yarn workers and each worker will read 
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy 
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc 
> issue, and just as titled we have observed severe performance downgrade. 
> After some analytics we found the root cause is the lock in 
> BucketCache#getBlock, as shown below
> {code}
>   try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
>   // ...
>   int len = bucketEntry.getLength();
>   Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
>   bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the 
> operation in LruCache. And since we're using synchronized in 
> IdLock#getLockEntry, parallel read dropping on the same bucket would be 
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in 
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12790) Support fairness across parallelized scans

2015-11-06 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994488#comment-14994488
 ] 

James Taylor commented on HBASE-12790:
--

If HBase can provide a means for Phoenix to realize its SLAs across a fully 
loaded cluster, then we'll happily leverage it. The current HBase FIFO 
scheduled doesn't do that, so we need to either make it pluggable or provide a 
scheduler that does. The current patch solves the issue - how about we do the 
simple suggestion that [~apurtell] suggested to fix up the patch to handle 
writes too?

> Support fairness across parallelized scans
> --
>
> Key: HBASE-12790
> URL: https://issues.apache.org/jira/browse/HBASE-12790
> Project: HBase
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: ramkrishna.s.vasudevan
>  Labels: Phoenix
> Attachments: AbstractRoundRobinQueue.java, HBASE-12790.patch, 
> HBASE-12790_1.patch, HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, 
> HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip
>
>
> Some HBase clients parallelize the execution of a scan to reduce latency in 
> getting back results. This can lead to starvation with a loaded cluster and 
> interleaved scans, since the RPC queue will be ordered and processed on a 
> FIFO basis. For example, if there are two clients, A & B that submit largish 
> scans at the same time. Say each scan is broken down into 100 scans by the 
> client (broken down into equal depth chunks along the row key), and the 100 
> scans of client A are queued first, followed immediately by the 100 scans of 
> client B. In this case, client B will be starved out of getting any results 
> back until the scans for client A complete.
> One solution to this is to use the attached AbstractRoundRobinQueue instead 
> of the standard FIFO queue. The queue to be used could be (maybe it already 
> is) configurable based on a new config parameter. Using this queue would 
> require the client to have the same identifier for all of the 100 parallel 
> scans that represent a single logical scan from the clients point of view. 
> With this information, the round robin queue would pick off a task from the 
> queue in a round robin fashion (instead of a strictly FIFO manner) to prevent 
> starvation over interleaved parallelized scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14759) Avoid using Math.abs when selecting SyncRunner in FSHLog

2015-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994495#comment-14994495
 ] 

Enis Soztutar commented on HBASE-14759:
---

bq. I think this will update syncRunnerIndex? Enis Soztutar
Thanks Duo. I have missed the first part of the change. 

+1 for the patch. 

> Avoid using Math.abs when selecting SyncRunner in FSHLog
> 
>
> Key: HBASE-14759
> URL: https://issues.apache.org/jira/browse/HBASE-14759
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14759.patch
>
>
> {code:title=FSHLog.java}
> int index = Math.abs(this.syncRunnerIndex++) % this.syncRunners.length;
>   try {
> this.syncRunners[index].offer(sequence, this.syncFutures, 
> this.syncFuturesCount);
>   } catch (Exception e) {
> // Should NEVER get here.
> requestLogRoll();
> this.exception = new DamagedWALException("Failed offering sync", 
> e);
>   }
> {code}
> Math.abs will return Integer.MIN_VALUE if you pass Integer.MIN_VALUE in since 
> the actual absolute value of Integer.MIN_VALUE is out of range.
> I think {{this.syncRunnerIndex++}} will overflow eventually if we keep the 
> regionserver running for enough time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12072) Standardize retry handling for master operations

2015-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994506#comment-14994506
 ] 

Enis Soztutar commented on HBASE-12072:
---

bq. To confirm, we add unreleased version corresponding to every brach the 
change was pushed, right?
Yes. The committer who commits the code marks the next version to be released 
from that branch as the fixVersion. At the time of the commit, that particular 
fixVersion would not be released yet. 
bq. If yes, seems like 0.99.2 was made from 1.0 branch.
Yes, we have forked branch-1.0, and did all the 0.99.x release and 1.0.x 
releases from that branch. 
bq. Also let me know if I should revert the changes i made.
foo

> Standardize retry handling for master operations
> 
>
> Key: HBASE-12072
> URL: https://issues.apache.org/jira/browse/HBASE-12072
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 1.0.0, 2.0.0, 0.99.2
>
> Attachments: 12072-v1.txt, 12072-v2.txt, hbase-12072_v1.patch, 
> hbase-12072_v2.patch, hbase-12072_v2.patch, hbase-12072_v3.patch
>
>
> For master requests, there are two retry mechanisms in effect. The first one 
> is from HBaseAdmin.executeCallable() 
> {code}
>   private  V executeCallable(MasterCallable callable) throws 
> IOException {
> RpcRetryingCaller caller = rpcCallerFactory.newCaller();
> try {
>   return caller.callWithRetries(callable);
> } finally {
>   callable.close();
> }
>   }
> {code}
> And inside, the other one is from StubMaker.makeStub():
> {code}
> /**
>* Create a stub against the master.  Retry if necessary.
>* @return A stub to do intf against the master
>* @throws MasterNotRunningException
>*/
>   @edu.umd.cs.findbugs.annotations.SuppressWarnings 
> (value="SWL_SLEEP_WITH_LOCK_HELD")
>   Object makeStub() throws MasterNotRunningException {
> {code}
> The tests will just hang for 10 min * 35 ~= 6hours. 
> {code}
> 2014-09-23 16:19:05,151 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 1 of 35 
> failed; retrying after sleep of 100, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,253 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 2 of 35 
> failed; retrying after sleep of 200, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,456 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 3 of 35 
> failed; retrying after sleep of 300, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,759 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 4 of 35 
> failed; retrying after sleep of 500, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:06,262 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 5 of 35 
> failed; retrying after sleep of 1008, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:07,273 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 6 of 35 
> failed; retrying after sleep of 2011, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:09,286 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 7 of 35 
> failed; retrying after sleep of 4012, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:13,303 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 8 of 35 
> failed; retrying after sleep of 10033, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:23,343 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 9 of 35 
> failed; retrying after sleep of 10089, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:33,439 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 10 of 
> 35 failed; retrying after sleep of 10027, exception=java.io.IOException: 
> Can't get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:43,473 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 11 of 
> 35 failed; retrying after sleep of 10004, exception=java.io.IOException: 
> Can't get master address fr

[jira] [Comment Edited] (HBASE-12072) Standardize retry handling for master operations

2015-11-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994506#comment-14994506
 ] 

Enis Soztutar edited comment on HBASE-12072 at 11/6/15 9:48 PM:


bq. To confirm, we add unreleased version corresponding to every brach the 
change was pushed, right?
Yes. The committer who commits the code marks the next version to be released 
from that branch as the fixVersion. At the time of the commit, that particular 
fixVersion would not be released yet. 
bq. If yes, seems like 0.99.2 was made from 1.0 branch.
Yes, we have forked branch-1.0, and did all the 0.99.x release and 1.0.x 
releases from that branch. 
bq. Also let me know if I should revert the changes i made.
That would be good. Thanks. 

Let me know if there is anything not clear. 


was (Author: enis):
bq. To confirm, we add unreleased version corresponding to every brach the 
change was pushed, right?
Yes. The committer who commits the code marks the next version to be released 
from that branch as the fixVersion. At the time of the commit, that particular 
fixVersion would not be released yet. 
bq. If yes, seems like 0.99.2 was made from 1.0 branch.
Yes, we have forked branch-1.0, and did all the 0.99.x release and 1.0.x 
releases from that branch. 
bq. Also let me know if I should revert the changes i made.
foo

> Standardize retry handling for master operations
> 
>
> Key: HBASE-12072
> URL: https://issues.apache.org/jira/browse/HBASE-12072
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 1.0.0, 2.0.0, 0.99.2
>
> Attachments: 12072-v1.txt, 12072-v2.txt, hbase-12072_v1.patch, 
> hbase-12072_v2.patch, hbase-12072_v2.patch, hbase-12072_v3.patch
>
>
> For master requests, there are two retry mechanisms in effect. The first one 
> is from HBaseAdmin.executeCallable() 
> {code}
>   private  V executeCallable(MasterCallable callable) throws 
> IOException {
> RpcRetryingCaller caller = rpcCallerFactory.newCaller();
> try {
>   return caller.callWithRetries(callable);
> } finally {
>   callable.close();
> }
>   }
> {code}
> And inside, the other one is from StubMaker.makeStub():
> {code}
> /**
>* Create a stub against the master.  Retry if necessary.
>* @return A stub to do intf against the master
>* @throws MasterNotRunningException
>*/
>   @edu.umd.cs.findbugs.annotations.SuppressWarnings 
> (value="SWL_SLEEP_WITH_LOCK_HELD")
>   Object makeStub() throws MasterNotRunningException {
> {code}
> The tests will just hang for 10 min * 35 ~= 6hours. 
> {code}
> 2014-09-23 16:19:05,151 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 1 of 35 
> failed; retrying after sleep of 100, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,253 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 2 of 35 
> failed; retrying after sleep of 200, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,456 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 3 of 35 
> failed; retrying after sleep of 300, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:05,759 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 4 of 35 
> failed; retrying after sleep of 500, exception=java.io.IOException: Can't get 
> master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:06,262 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 5 of 35 
> failed; retrying after sleep of 1008, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:07,273 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 6 of 35 
> failed; retrying after sleep of 2011, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:09,286 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 7 of 35 
> failed; retrying after sleep of 4012, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:13,303 INFO  [main] 
> client.ConnectionManager$HConnectionImplementation: getMaster attempt 8 of 35 
> failed; retrying after sleep of 10033, exception=java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 2014-09-23 16:19:23,343 INFO  [main] 
> client.ConnectionManager$HConne

[jira] [Commented] (HBASE-12790) Support fairness across parallelized scans

2015-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994538#comment-14994538
 ] 

stack commented on HBASE-12790:
---

bq. Scan attributes alone will not do IMHO because the queues cannot do this 
round robin for now.

They can round robin over the Scans but you are saying the scheduler needs to 
distinguish at a higher level than per Scan? It can't arbitrate on Scanner 
lease or a Scanner id attribute? Scheduler needs to make sure that we schedule 
scans from different clients... We could just schedule the same client over and 
over and shut out all others?

bq. Let me check that more closely in terms of phoenix code also.

Thanks.

bq. ...or provide a scheduler that does.

I agree with this bit. Long scans or a single client hogging server resources 
is broke for everyone. Lets fix it for all rather than just for phoenix?

bq.  how about we do the simple suggestion that Andrew Purtell suggested to 
fix up the patch to handle writes too?

Because it pulls in an alien notion of 'groups', a tiering/complication that we 
can hopefully do without.




> Support fairness across parallelized scans
> --
>
> Key: HBASE-12790
> URL: https://issues.apache.org/jira/browse/HBASE-12790
> Project: HBase
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: ramkrishna.s.vasudevan
>  Labels: Phoenix
> Attachments: AbstractRoundRobinQueue.java, HBASE-12790.patch, 
> HBASE-12790_1.patch, HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, 
> HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip
>
>
> Some HBase clients parallelize the execution of a scan to reduce latency in 
> getting back results. This can lead to starvation with a loaded cluster and 
> interleaved scans, since the RPC queue will be ordered and processed on a 
> FIFO basis. For example, if there are two clients, A & B that submit largish 
> scans at the same time. Say each scan is broken down into 100 scans by the 
> client (broken down into equal depth chunks along the row key), and the 100 
> scans of client A are queued first, followed immediately by the 100 scans of 
> client B. In this case, client B will be starved out of getting any results 
> back until the scans for client A complete.
> One solution to this is to use the attached AbstractRoundRobinQueue instead 
> of the standard FIFO queue. The queue to be used could be (maybe it already 
> is) configurable based on a new config parameter. Using this queue would 
> require the client to have the same identifier for all of the 100 parallel 
> scans that represent a single logical scan from the clients point of view. 
> With this information, the round robin queue would pick off a task from the 
> queue in a round robin fashion (instead of a strictly FIFO manner) to prevent 
> starvation over interleaved parallelized scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14717) Enable_table_replication should not create table in peer cluster if specified few tables added in peer

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994547#comment-14994547
 ] 

Hadoop QA commented on HBASE-14717:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12771069/HBASE-14717%281%29.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771069

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

  {color:red}-1 core zombie tests{color}.  There are possible 5 zombie 
test(s): at 
org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat.testWithMapReduceImpl(TestTableSnapshotInputFormat.java:225)
at 
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatTestBase.testWithMapReduce(TableSnapshotInputFormatTestBase.java:164)
at 
org.apache.hadoop.hbase.mapred.TestTableSnapshotInputFormat.testWithMapReduceMultiRegion(TestTableSnapshotInputFormat.java:141)
at 
org.apache.hadoop.hbase.mapred.TestTableInputFormat.testTableRecordReaderScannerFailTwice(TestTableInputFormat.java:290)
at 
org.apache.hadoop.hbase.backup.TestHFileArchiving.testArchiveOnTableFamilyDelete(TestHFileArchiving.java:322)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16434//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16434//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16434//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16434//console

This message is automatically generated.

> Enable_table_replication should not create table in peer cluster if specified 
> few tables added in peer
> --
>
> Key: HBASE-14717
> URL: https://issues.apache.org/jira/browse/HBASE-14717
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.0.2
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Ashish Singhi
> Attachments: HBASE-14717(1).patch, HBASE-14717.patch
>
>
> For a peer only user specified tables should be created but 
> enable_table_replication command is not honouring that.
> eg:
> like peer1 : t1:cf1, t2
> create 't3', 'd'
> enable_table_replication 't3' > should not create t3 in peer1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14712) MasterProcWALs never clean up

2015-11-06 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994549#comment-14994549
 ] 

Elliott Clark commented on HBASE-14712:
---

Are the changes to Wal procedure bit set compatible ?

> MasterProcWALs never clean up
> -
>
> Key: HBASE-14712
> URL: https://issues.apache.org/jira/browse/HBASE-14712
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: Matteo Bertozzi
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14712-v0.patch, HBASE-14712-v1.patch, state.tar.gz
>
>
> MasterProcWALs directory grows pretty much un-bounded. Because of that when 
> master failover happens the NN is flooded with connections and everything 
> grinds to a halt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14712) MasterProcWALs never clean up

2015-11-06 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994552#comment-14994552
 ] 

Matteo Bertozzi commented on HBASE-14712:
-

yes, that is just in-memory stuff. nothing changed on the file

> MasterProcWALs never clean up
> -
>
> Key: HBASE-14712
> URL: https://issues.apache.org/jira/browse/HBASE-14712
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: Matteo Bertozzi
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14712-v0.patch, HBASE-14712-v1.patch, state.tar.gz
>
>
> MasterProcWALs directory grows pretty much un-bounded. Because of that when 
> master failover happens the NN is flooded with connections and everything 
> grinds to a halt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14767) Remove deprecated functions from HBaseAdmin

2015-11-06 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994555#comment-14994555
 ] 

Matteo Bertozzi commented on HBASE-14767:
-

+1

> Remove deprecated functions from HBaseAdmin
> ---
>
> Key: HBASE-14767
> URL: https://issues.apache.org/jira/browse/HBASE-14767
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-14767-master-v2.patch, 
> HBASE-14767-master-v3.patch, HBASE-14767-master-v4.patch, 
> HBASE-14767-master.patch
>
>
> Many functions in HBaseAdmin are marked deprecated. Removing them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException

2015-11-06 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994559#comment-14994559
 ] 

Appy commented on HBASE-14777:
--

+1
Please consider adding a unit test for the same.

> Replication fails with IndexOutOfBoundsException
> 
>
> Key: HBASE-14777
> URL: https://issues.apache.org/jira/browse/HBASE-14777
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Bhupendra Kumar Jain
>Assignee: Bhupendra Kumar Jain
>Priority: Critical
> Attachments: HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
>  threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 
> 1
>   at java.util.ArrayList.rangeCheck(Unknown Source)
>   at java.util.ArrayList.remove(Unknown Source)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication 
> entries list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14783) Master aborts when downgrading from 1.3 to 1.1

2015-11-06 Thread Ted Yu (JIRA)
Ted Yu created HBASE-14783:
--

 Summary: Master aborts when downgrading from 1.3 to 1.1
 Key: HBASE-14783
 URL: https://issues.apache.org/jira/browse/HBASE-14783
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


I was running ITBLL with 1.3 deployed on a 6 node cluster.

Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
However, master failed to start due to:
{code}
2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
master.HMaster: Failed to become active master
java.io.IOException: The procedure class 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
accessible and have an empty constructor
  at 
org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
  at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
  at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
  at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
  at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
  at 
org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
  at 
org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
  at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
  at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
  at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:191)
  at 
org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
  ... 12 more
{code}
The cause was that ServerCrashProcedure, written in some WAL file under 
MasterProcWALs from first run, was absent in 1.1 release.

After a brief discussion with Stephen, I am logging this JIRA to solicit 
discussion on how customer experience can be improved if downgrade of hbase is 
performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1

2015-11-06 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14783:
---
Summary: Proc-V2: Master aborts when downgrading from 1.3 to 1.1  (was: 
Master aborts when downgrading from 1.3 to 1.1)

> Proc-V2: Master aborts when downgrading from 1.3 to 1.1
> ---
>
> Key: HBASE-14783
> URL: https://issues.apache.org/jira/browse/HBASE-14783
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>
> I was running ITBLL with 1.3 deployed on a 6 node cluster.
> Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
> However, master failed to start due to:
> {code}
> 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
> master.HMaster: Failed to become active master
> java.io.IOException: The procedure class 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
> accessible and have an empty constructor
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
>   at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:191)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
>   ... 12 more
> {code}
> The cause was that ServerCrashProcedure, written in some WAL file under 
> MasterProcWALs from first run, was absent in 1.1 release.
> After a brief discussion with Stephen, I am logging this JIRA to solicit 
> discussion on how customer experience can be improved if downgrade of hbase 
> is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14712) MasterProcWALs never clean up

2015-11-06 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994567#comment-14994567
 ] 

Elliott Clark commented on HBASE-14712:
---

Sounds good. +1 from me.

> MasterProcWALs never clean up
> -
>
> Key: HBASE-14712
> URL: https://issues.apache.org/jira/browse/HBASE-14712
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: Matteo Bertozzi
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14712-v0.patch, HBASE-14712-v1.patch, state.tar.gz
>
>
> MasterProcWALs directory grows pretty much un-bounded. Because of that when 
> master failover happens the NN is flooded with connections and everything 
> grinds to a halt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1

2015-11-06 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang reassigned HBASE-14783:
--

Assignee: Stephen Yuan Jiang

> Proc-V2: Master aborts when downgrading from 1.3 to 1.1
> ---
>
> Key: HBASE-14783
> URL: https://issues.apache.org/jira/browse/HBASE-14783
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Stephen Yuan Jiang
>
> I was running ITBLL with 1.3 deployed on a 6 node cluster.
> Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
> However, master failed to start due to:
> {code}
> 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
> master.HMaster: Failed to become active master
> java.io.IOException: The procedure class 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
> accessible and have an empty constructor
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
>   at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:191)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
>   ... 12 more
> {code}
> The cause was that ServerCrashProcedure, written in some WAL file under 
> MasterProcWALs from first run, was absent in 1.1 release.
> After a brief discussion with Stephen, I am logging this JIRA to solicit 
> discussion on how customer experience can be improved if downgrade of hbase 
> is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1

2015-11-06 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994568#comment-14994568
 ] 

Stephen Yuan Jiang commented on HBASE-14783:


Thanks, [~tedyu] for reporting this.  

We are keeping adding new procedure types in new releases.  If a customer does 
not like the new release after upgrade; if they downgrade, they could hit this 
issue (newly introduced procedure type could not be replayed in older release.) 
 

One solution is that during procedure load, ignore the ClassNotFoundException 
and log a warning or error in exception and continue.

> Proc-V2: Master aborts when downgrading from 1.3 to 1.1
> ---
>
> Key: HBASE-14783
> URL: https://issues.apache.org/jira/browse/HBASE-14783
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>
> I was running ITBLL with 1.3 deployed on a 6 node cluster.
> Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
> However, master failed to start due to:
> {code}
> 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
> master.HMaster: Failed to become active master
> java.io.IOException: The procedure class 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
> accessible and have an empty constructor
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
>   at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:191)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
>   ... 12 more
> {code}
> The cause was that ServerCrashProcedure, written in some WAL file under 
> MasterProcWALs from first run, was absent in 1.1 release.
> After a brief discussion with Stephen, I am logging this JIRA to solicit 
> discussion on how customer experience can be improved if downgrade of hbase 
> is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994591#comment-14994591
 ] 

Hudson commented on HBASE-14632:


FAILURE: Integrated in HBase-1.3 #352 (See 
[https://builds.apache.org/job/HBase-1.3/352/])
HBASE-14632 Region server aborts due to unguarded dereference of Reader (tedyu: 
rev 0e2e5d328071a9f5a116a9fb0ed1df7bc1f562ab)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long getTotalStaticIndex

[jira] [Commented] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1

2015-11-06 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994605#comment-14994605
 ] 

Matteo Bertozzi commented on HBASE-14783:
-

there are 3 problems here
 - on a not clean shutdown your cluster is in a non consistent state, starting 
the master and ignoring the exception is a bad idea. you have run hbck or you 
can restart the cluster and do a clean shutdown before the downgrade.
 - on a clean shutdown and restart with lower version you will not have 
procedure running. so nothing to load and you'll not hit this problem

the 3rd is how do we load, completed procedures (but not yet deleted) are 
loaded using convert() which creates a Procedure instance, but we don't really 
need that. in fact we remove that instance later when we realize that we just 
need the result. 

> Proc-V2: Master aborts when downgrading from 1.3 to 1.1
> ---
>
> Key: HBASE-14783
> URL: https://issues.apache.org/jira/browse/HBASE-14783
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Stephen Yuan Jiang
>
> I was running ITBLL with 1.3 deployed on a 6 node cluster.
> Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
> However, master failed to start due to:
> {code}
> 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
> master.HMaster: Failed to become active master
> java.io.IOException: The procedure class 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
> accessible and have an empty constructor
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
>   at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:191)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
>   ... 12 more
> {code}
> The cause was that ServerCrashProcedure, written in some WAL file under 
> MasterProcWALs from first run, was absent in 1.1 release.
> After a brief discussion with Stephen, I am logging this JIRA to solicit 
> discussion on how customer experience can be improved if downgrade of hbase 
> is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14778) Make block cache hit percentages not integer in the metrics system

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994611#comment-14994611
 ] 

Hadoop QA commented on HBASE-14778:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771075/HBASE-14778.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771075

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1727 checkstyle errors (more than the master's current 1726 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16435//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16435//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16435//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16435//console

This message is automatically generated.

> Make block cache hit percentages not integer in the metrics system
> --
>
> Key: HBASE-14778
> URL: https://issues.apache.org/jira/browse/HBASE-14778
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14778.patch
>
>
> Once you're close to the 90%+ it's hard to see a difference because getting a 
> full percent change is rare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14716) Detection of orphaned table znode should cover table in Enabled state

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14716:
---
Labels: hbck  (was: )

> Detection of orphaned table znode should cover table in Enabled state
> -
>
> Key: HBASE-14716
> URL: https://issues.apache.org/jira/browse/HBASE-14716
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>  Labels: hbck
> Attachments: 14716-branch-1-v1.txt
>
>
> HBASE-12070 introduced fix for orphaned table znode where table doesn't have 
> entry in hbase:meta
> When Stephen and I investigated rolling upgrade failure,
> {code}
> 2015-10-27 18:21:10,668 WARN  [ProcedureExecutorThread-3] 
> procedure.CreateTableProcedure: The table smoketest does not exist in meta 
> but has a znode. run hbck to fix inconsistencies.
> {code}
> we found that the orphaned table znode corresponded to table in Enabled state.
> Therefore running hbck didn't report the inconsistency.
> Detection for orphaned table znode should cover this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1

2015-11-06 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994605#comment-14994605
 ] 

Matteo Bertozzi edited comment on HBASE-14783 at 11/6/15 11:03 PM:
---

there are 3 problems here
 - on a not clean shutdown your cluster is in a non consistent state, starting 
the master and ignoring the exception is a bad idea. you have run hbck or you 
can restart the cluster and do a clean shutdown before the downgrade.
 - on a clean shutdown and restart with lower version you will not have 
procedure running. so nothing to load and you'll not hit this problem

the 3rd problem is related to how we load, completed procedures (but not yet 
deleted) are loaded using convert() which creates a Procedure instance, but we 
don't really need that. in fact we remove that instance later when we realize 
that we just need the result. so even in case of a clean shutdown we may get 
the exception because we are trying to call convert().


was (Author: mbertozzi):
there are 3 problems here
 - on a not clean shutdown your cluster is in a non consistent state, starting 
the master and ignoring the exception is a bad idea. you have run hbck or you 
can restart the cluster and do a clean shutdown before the downgrade.
 - on a clean shutdown and restart with lower version you will not have 
procedure running. so nothing to load and you'll not hit this problem

the 3rd is how do we load, completed procedures (but not yet deleted) are 
loaded using convert() which creates a Procedure instance, but we don't really 
need that. in fact we remove that instance later when we realize that we just 
need the result. 

> Proc-V2: Master aborts when downgrading from 1.3 to 1.1
> ---
>
> Key: HBASE-14783
> URL: https://issues.apache.org/jira/browse/HBASE-14783
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Stephen Yuan Jiang
>
> I was running ITBLL with 1.3 deployed on a 6 node cluster.
> Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
> However, master failed to start due to:
> {code}
> 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
> master.HMaster: Failed to become active master
> java.io.IOException: The procedure class 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
> accessible and have an empty constructor
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
>   at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:191)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
>   ... 12 more
> {code}
> The cause was that ServerCrashProcedure, written in some WAL file under 
> MasterProcWALs from first run, was absent in 1.1 release.
> After a brief discussion with Stephen, I am logging this JIRA to solicit 
> discussion on how customer experience can be improved if downgrade of hbase 
> is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14779) Revamp IntegrationTestMTTR

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994651#comment-14994651
 ] 

Hadoop QA commented on HBASE-14779:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771082/hbase-14779.patch
  against master branch at commit bfa36891901b96b95d82f5307642c35fd2b9f534.
  ATTACHMENT ID: 12771082

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16436//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16436//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16436//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16436//console

This message is automatically generated.

> Revamp IntegrationTestMTTR
> --
>
> Key: HBASE-14779
> URL: https://issues.apache.org/jira/browse/HBASE-14779
> Project: HBase
>  Issue Type: Improvement
>  Components: integration tests
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-14779.patch
>
>
> I've recently been trying to revive IntegrationTestMTTR runs and found that 
> it tended to not complete in less 6 hours and wasn't written as many of the 
> other Integration Tests.
> I'm going to revamp it a local it run of it can finish in < 30mins and to 
> make it more configurable for a run against  a real cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14706) RegionLocationFinder should return multiple servernames by top host

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994735#comment-14994735
 ] 

Hudson commented on HBASE-14706:


FAILURE: Integrated in HBase-1.1-JDK8 #1675 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1675/])
HBASE-14706 RegionLocationFinder should return multiple servernames by (tedyu: 
rev 7098e8112202fd855e13dc69dec21a7d3006c4b6)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/RegionLocationFinder.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestRegionLocationFinder.java


> RegionLocationFinder should return multiple servernames by top host
> ---
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14706-branch-1.1.patch, 
> HBASE-14706-trunk_v1.patch, HBASE-14706-trunk_v2.patch, 
> HBASE-14706-trunk_v3.patch, HBASE-14706-trunk_v4.patch, HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14780) Integration Tests that run with ChaosMonkey need to specify CFs

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994764#comment-14994764
 ] 

Hadoop QA commented on HBASE-14780:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771094/hbase-14780.patch
  against master branch at commit 263a0adf79105b9dc166e21c3f5159ade6e2d0a7.
  ATTACHMENT ID: 12771094

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

  {color:red}-1 core zombie tests{color}.  There are possible 5 zombie 
test(s): at 
org.apache.hadoop.hbase.mapreduce.TestImportExport.testImport94Table(TestImportExport.java:246)
at 
org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat.testScan(TestMultiTableInputFormat.java:247)
at 
org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat.testScanEmptyToAPP(TestMultiTableInputFormat.java:186)
at 
org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTableWithTimestamp(TestImportTsv.java:139)
at 
org.apache.hadoop.hbase.mapreduce.TestSyncTable.testSyncTable(TestSyncTable.java:93)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16437//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16437//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16437//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16437//console

This message is automatically generated.

> Integration Tests that run with ChaosMonkey need to specify CFs
> ---
>
> Key: HBASE-14780
> URL: https://issues.apache.org/jira/browse/HBASE-14780
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Attachments: hbase-14780.patch
>
>
> Been running some IT tests and found that some failed because getcfs was null 
> and didn't  protecte cfs that were assumed to go unmolested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14774) Raise the font size on high-DPI small-screen devices like iphone 6+

2015-11-06 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994765#comment-14994765
 ] 

Jonathan Hsieh commented on HBASE-14774:


so for the image I sent it ws before the font change.  I looks ok to me (though 
I wish the menus could be in one of the tab things common on other mobile 
sites.  maybe a follow on issue for that.

> Raise the font size on high-DPI small-screen devices like iphone 6+
> ---
>
> Key: HBASE-14774
> URL: https://issues.apache.org/jira/browse/HBASE-14774
> Project: HBase
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.0.0
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Fix For: 2.0.0
>
> Attachments: HBASE-14774.patch, image.jpg
>
>
> On iPads and things like that, the website looks fine. But the fonts are too 
> small on high-DPI small screens. It's tiny on my iPhone 6+.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14784) Port conflict is not resolved in HBaseTestingUtility.randomFreePort()

2015-11-06 Thread Youngjoon Kim (JIRA)
Youngjoon Kim created HBASE-14784:
-

 Summary: Port conflict is not resolved in 
HBaseTestingUtility.randomFreePort()
 Key: HBASE-14784
 URL: https://issues.apache.org/jira/browse/HBASE-14784
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 1.1.2
Reporter: Youngjoon Kim
Assignee: Youngjoon Kim
Priority: Minor


If takenRandomPorts.contains(port) == true, it means port conflict, so 
randomFreePort() should rerun the loop. But continue statement leads to exit 
the loop, because port != 0.

{code:title=hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java}
public static int randomFreePort() {
  int port = 0; 
  do { 
port = randomPort();
if (takenRandomPorts.contains(port)) {
  continue;
}
takenRandomPorts.add(port);

...

  } while (port == 0);
  return port;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14759) Avoid using Math.abs when selecting SyncRunner in FSHLog

2015-11-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994772#comment-14994772
 ] 

Duo Zhang commented on HBASE-14759:
---

OK, Thanks. Let me commit this.

> Avoid using Math.abs when selecting SyncRunner in FSHLog
> 
>
> Key: HBASE-14759
> URL: https://issues.apache.org/jira/browse/HBASE-14759
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14759.patch
>
>
> {code:title=FSHLog.java}
> int index = Math.abs(this.syncRunnerIndex++) % this.syncRunners.length;
>   try {
> this.syncRunners[index].offer(sequence, this.syncFutures, 
> this.syncFuturesCount);
>   } catch (Exception e) {
> // Should NEVER get here.
> requestLogRoll();
> this.exception = new DamagedWALException("Failed offering sync", 
> e);
>   }
> {code}
> Math.abs will return Integer.MIN_VALUE if you pass Integer.MIN_VALUE in since 
> the actual absolute value of Integer.MIN_VALUE is out of range.
> I think {{this.syncRunnerIndex++}} will overflow eventually if we keep the 
> regionserver running for enough time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14778) Make block cache hit percentages not integer in the metrics system

2015-11-06 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994773#comment-14994773
 ] 

Jonathan Hsieh commented on HBASE-14778:


Do the numbers look ok in the web interface or could they use some formatting? 
(see ServerMetricsTmpl.jamon).



> Make block cache hit percentages not integer in the metrics system
> --
>
> Key: HBASE-14778
> URL: https://issues.apache.org/jira/browse/HBASE-14778
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14778.patch
>
>
> Once you're close to the 90%+ it's hard to see a difference because getting a 
> full percent change is rare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14784) Port conflict is not resolved in HBaseTestingUtility.randomFreePort()

2015-11-06 Thread Youngjoon Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youngjoon Kim updated HBASE-14784:
--
Attachment: HBASE-14784.patch

Add a patch targeting master branch.

> Port conflict is not resolved in HBaseTestingUtility.randomFreePort()
> -
>
> Key: HBASE-14784
> URL: https://issues.apache.org/jira/browse/HBASE-14784
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.2
>Reporter: Youngjoon Kim
>Assignee: Youngjoon Kim
>Priority: Minor
> Attachments: HBASE-14784.patch
>
>
> If takenRandomPorts.contains(port) == true, it means port conflict, so 
> randomFreePort() should rerun the loop. But continue statement leads to exit 
> the loop, because port != 0.
> {code:title=hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java}
> public static int randomFreePort() {
>   int port = 0; 
>   do { 
> port = randomPort();
> if (takenRandomPorts.contains(port)) {
>   continue;
> }
> takenRandomPorts.add(port);
> ...
>   } while (port == 0);
>   return port;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994777#comment-14994777
 ] 

Hudson commented on HBASE-14632:


FAILURE: Integrated in HBase-1.3-IT #298 (See 
[https://builds.apache.org/job/HBase-1.3-IT/298/])
HBASE-14632 Region server aborts due to unguarded dereference of Reader (tedyu: 
rev 0e2e5d328071a9f5a116a9fb0ed1df7bc1f562ab)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
HBASE-14632 Revert due to over commit (tedyu: rev 
c69c74fcbfc6c1313bbac174dd181469755b4926)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
HBASE-14632 Region server aborts due to unguarded dereference of Reader (tedyu: 
rev c1a19dece03bc6696dbfbb69e801959f16ab69e9)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.secu

[jira] [Commented] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994788#comment-14994788
 ] 

Hudson commented on HBASE-14632:


FAILURE: Integrated in HBase-Trunk_matrix #440 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/440/])
HBASE-14632 Region server aborts due to unguarded dereference of Reader (tedyu: 
rev 6ec4a968144b7dfcbddcd3648e6139c985044e41)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long g

[jira] [Updated] (HBASE-14784) Port conflict is not resolved in HBaseTestingUtility.randomFreePort()

2015-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14784:
---
Status: Patch Available  (was: Open)

> Port conflict is not resolved in HBaseTestingUtility.randomFreePort()
> -
>
> Key: HBASE-14784
> URL: https://issues.apache.org/jira/browse/HBASE-14784
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.2
>Reporter: Youngjoon Kim
>Assignee: Youngjoon Kim
>Priority: Minor
> Attachments: HBASE-14784.patch
>
>
> If takenRandomPorts.contains(port) == true, it means port conflict, so 
> randomFreePort() should rerun the loop. But continue statement leads to exit 
> the loop, because port != 0.
> {code:title=hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java}
> public static int randomFreePort() {
>   int port = 0; 
>   do { 
> port = randomPort();
> if (takenRandomPorts.contains(port)) {
>   continue;
> }
> takenRandomPorts.add(port);
> ...
>   } while (port == 0);
>   return port;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994787#comment-14994787
 ] 

Hudson commented on HBASE-14463:


FAILURE: Integrated in HBase-Trunk_matrix #440 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/440/])
HBASE-14463 Severe performance downgrade when parallel reading a single (tedyu: 
rev 263a0adf79105b9dc166e21c3f5159ade6e2d0a7)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestIdReadWriteLock.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/bucket/TestBucketCache.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/IdReadWriteLock.java


> Severe performance downgrade when parallel reading a single key from 
> BucketCache
> 
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.17
>
> Attachments: 14463-branch-1-v12.txt, GC_with_WeakObjectPool.png, 
> HBASE-14463.patch, HBASE-14463_v11.patch, HBASE-14463_v12.patch, 
> HBASE-14463_v12.patch, HBASE-14463_v2.patch, HBASE-14463_v3.patch, 
> HBASE-14463_v4.patch, HBASE-14463_v5.patch, 
> TestBucketCache-new_with_IdLock.png, 
> TestBucketCache-new_with_IdReadWriteLock.png, 
> TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, 
> TestBucketCache_with_IdReadWriteLock-latest.png, 
> TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, 
> TestBucketCache_with_IdReadWriteLock.png, pe_use_same_keys.patch, 
> test-results.tar.gz
>
>
> We store feature data of online items in HBase, do machine learning on these 
> features, and supply the outputs to our online search engine. In such 
> scenario we will launch hundreds of yarn workers and each worker will read 
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy 
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc 
> issue, and just as titled we have observed severe performance downgrade. 
> After some analytics we found the root cause is the lock in 
> BucketCache#getBlock, as shown below
> {code}
>   try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
>   // ...
>   int len = bucketEntry.getLength();
>   Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
>   bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the 
> operation in LruCache. And since we're using synchronized in 
> IdLock#getLockEntry, parallel read dropping on the same bucket would be 
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in 
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14706) RegionLocationFinder should return multiple servernames by top host

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994796#comment-14994796
 ] 

Hudson commented on HBASE-14706:


SUCCESS: Integrated in HBase-1.1-JDK7 #1588 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1588/])
HBASE-14706 RegionLocationFinder should return multiple servernames by (tedyu: 
rev 7098e8112202fd855e13dc69dec21a7d3006c4b6)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestRegionLocationFinder.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/RegionLocationFinder.java


> RegionLocationFinder should return multiple servernames by top host
> ---
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14706-branch-1.1.patch, 
> HBASE-14706-trunk_v1.patch, HBASE-14706-trunk_v2.patch, 
> HBASE-14706-trunk_v3.patch, HBASE-14706-trunk_v4.patch, HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14632) Region server aborts due to unguarded dereference of Reader

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994800#comment-14994800
 ] 

Hudson commented on HBASE-14632:


SUCCESS: Integrated in HBase-1.2-IT #268 (See 
[https://builds.apache.org/job/HBase-1.2-IT/268/])
HBASE-14632 Region server aborts due to unguarded dereference of Reader (tedyu: 
rev 805fcc63466ed79760e4b41d62c6d5c7fe59d7c9)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java


> Region server aborts due to unguarded dereference of Reader
> ---
>
> Key: HBASE-14632
> URL: https://issues.apache.org/jira/browse/HBASE-14632
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14632-v1.txt, 14632-v2.txt
>
>
> I noticed the following in one run of 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster-output.txt 
> :
> {code}
> 2015-10-16 09:46:33,108 INFO  [main] client.HBaseAdmin$10(1233): Started 
> disable of testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,108 INFO  
> [B.defaultRpcServer.handler=4,queue=0,port=38813] master.HMaster(1908): 
> Client=hbase/null disable   
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck
> 2015-10-16 09:46:33,223 DEBUG 
> [B.defaultRpcServer.handler=4,queue=0,port=38813] 
> procedure2.ProcedureExecutor(654): Procedure DisableTableProcedure
> 
> (table=testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck) 
> id=24 owner=hbase state=RUNNABLE:DISABLE_TABLE_PREPARE added to the store.
> 2015-10-16 09:46:33,225 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=38813] 
> master.MasterRpcServices(1057): Checking to see if procedure is done procId=24
> 2015-10-16 09:46:33,230 DEBUG [ProcedureExecutor-22] 
> lock.ZKInterProcessLockBase(226): Acquired a lock for /hbase/table-lock/  
>
> testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/write-master:3881301
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1910): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,320 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 WARN  [RS:0;cn012:53683] regionserver.HStore(1924): 
> StoreFile 
> hdfs://localhost:40022/user/hbase/test-data/f09d7163-94f7-4218-b1b0-43dfc733a37b/data/
>   
> default/testStoreFileReferenceCreationWhenSplitPolicySaysToSkipRangeCheck/dc90661cebac678ac508ed98093fc3e9/f/fffae6d1a0234c1791d8098cbcdb2c5e
>  has a null Reader
> 2015-10-16 09:46:33,321 FATAL [RS:0;cn012:53683] 
> regionserver.HRegionServer(2078): ABORTING region server 
> cn012.l42scl.hortonworks.com,53683,1445013948320: Unhandled: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HStore.getTotalStaticIndexSize(HStore.java:1936)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1470)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1206)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:965)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:302)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is related code:
> {code}
>   public long getTotalStati

[jira] [Commented] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1

2015-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994808#comment-14994808
 ] 

stack commented on HBASE-14783:
---

We do not support downgrade. Never have. Its a bunch of work -- code-wise and 
testing-wise. Suggest we close this issue as invalid.

> Proc-V2: Master aborts when downgrading from 1.3 to 1.1
> ---
>
> Key: HBASE-14783
> URL: https://issues.apache.org/jira/browse/HBASE-14783
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Stephen Yuan Jiang
>
> I was running ITBLL with 1.3 deployed on a 6 node cluster.
> Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
> However, master failed to start due to:
> {code}
> 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
> master.HMaster: Failed to become active master
> java.io.IOException: The procedure class 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
> accessible and have an empty constructor
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
>   at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:191)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
>   ... 12 more
> {code}
> The cause was that ServerCrashProcedure, written in some WAL file under 
> MasterProcWALs from first run, was absent in 1.1 release.
> After a brief discussion with Stephen, I am logging this JIRA to solicit 
> discussion on how customer experience can be improved if downgrade of hbase 
> is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >