date:20180403

[jira] [Created] (HBASE-20340) potential null pointer exception in org.apache.hadoop.hbase.rest.TableScanResource given empty resource

2018-04-03 Thread andy zhou (JIRA)

andy zhou created HBASE-20340:
-

 Summary: potential null pointer exception in 
org.apache.hadoop.hbase.rest.TableScanResource given empty resource
 Key: HBASE-20340
 URL: https://issues.apache.org/jira/browse/HBASE-20340
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 2.0.0-beta-2
Reporter: andy zhou


Our code analyzer has detected a potential null pointer issue in 
org.apache.hadoop.hbase.rest.TableScanResource as follows:
{code:java}
// org.apache.hadoop.hbase.client.Result.java Line #244 
public List listCells() {
return isEmpty()? null: Arrays.asList(rawCells());
}
{code}
{code:java}
// org.apache.hadoop.hbase.rest.TableScanResource Line #95
List kvs = rs.listCells(); 
for (Cell kv : kvs) { ... }
{code}
 

Given empty rs, kvs shall be null instead of an empty list, leading to 
potential null pointer exception

 

(org.apache.hadoop.hbase.client.Result.java Line #244 and 
org.apache.hadoop.hbase.rest.TableScanResource Line #95

Linkage to the code is here:

https://github.com/apache/hbase/blob/9e9b347d667e1fc6165c9f8ae5ae7052147e8895/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Result.java#L244

SourceBrella Inc.

https://github.com/apache/hbase/blob/9e9b347d667e1fc6165c9f8ae5ae7052147e8895/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/TableScanResource.java#L95



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20219) An error occurs when scanning with reversed=true and loadColumnFamiliesOnDemand=true

2018-04-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425085#comment-16425085
 ] 

Hadoop QA commented on HBASE-20219:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  1m 
35s{color} | {color:red} branch has 7 errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} hbase-server: The patch generated 0 new + 5 
unchanged - 4 fixed = 5 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  0m 
13s{color} | {color:red} patch has 7 errors when building our shaded downstream 
artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 46s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}101m 
54s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20219 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917493/HBASE-20219.master.004.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 3c17ed9515f5 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / b1b0db3195 |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12295/testReport/ |
| Max. process+thread count | 4210 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12295/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This messa

[jira] [Commented] (HBASE-19800) hbase shell api 'list' or program api prefix has regex problem

2018-04-03 Thread Nihal Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425083#comment-16425083
 ] 

Nihal Jain commented on HBASE-19800:


Actually shell/API is returning all the tablenames which belong to any 
namespace starting with 'd'. As the default namespace for hbase is "default", 
you are getting all the tables in that namespace.

Try running:
{code:java}
list "default:d.*"
{code}
It will only return tables starting with character 'd' in default namespace 
"default".

> hbase shell api 'list' or program api prefix has regex problem
> --
>
> Key: HBASE-19800
> URL: https://issues.apache.org/jira/browse/HBASE-19800
> Project: HBase
>  Issue Type: Bug
>  Components: API, shell
>Affects Versions: 1.2.1
> Environment: hadoop 2.7.3 
> hbase 1.2.1
>Reporter: zzzhy
>Priority: Major
> Attachments: image-2018-01-16-12-08-20-327.png, 
> image-2018-01-16-12-13-25-723.png
>
>
> while using list command in hbase shell, most of all works well except which 
> one contains 'd' char, as well as hbase program api prefix regex.
> eg.  list 'd.\*' wont work, but '^d\[\da-f\]\{31\}' works well. and 'd.\*' 
> performs just listing all of the tables.
> !image-2018-01-16-12-13-25-723.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20339) A potential security issue in org.apache.hadoop.hbase.http.log.LogLevel.java

2018-04-03 Thread andy zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

andy zhou updated HBASE-20339:
--
Description: 
Our program analyzer has detected a potential security issue as follows 
{code:java}
PrintWriter out = ServletUtil.initHTML(response, "Log Level");
String logName = ServletUtil.getParameter(request, "log");
String level = ServletUtil.getParameter(request, "level");

if (logName != null) {
   out.println("Results");
   out.println(MARKER
+ "Submitted Log Name: " + logName + "");
  ...
}{code}
Above is the code piece. Seems that the log name is directly collected from the 
web request, and only whether the data is null is checked. So an attacker may 
provide a "logName" with a piece of injected code, leading to cross-site 
attacks. And besides, the variable "level" may also have such vulnerability.

 

(org.apache.hadoop.hbase.http.log.LogLevel.java Line 111/118)

Linkage to the code is here:

[https://github.com/apache/hbase/blob/9e9b347d667e1fc6165c9f8ae5ae7052147e8895/hbase-http/src/main/java/org/apache/hadoop/hbase/http/log/LogLevel.java#L111]

 

SourceBrella inc.

  was:
Our program analyzer have detected a potential security issue as follows 
{code:java}
PrintWriter out = ServletUtil.initHTML(response, "Log Level");
String logName = ServletUtil.getParameter(request, "log");
String level = ServletUtil.getParameter(request, "level");

if (logName != null) {
   out.println("Results");
   out.println(MARKER
+ "Submitted Log Name: " + logName + "");
  ...
}{code}
Above is the code piece. Seems that the log name is directly collected from the 
web request, and only whether the data is null is checked. So an attacker may 
provide a logName with a piece of injected code leading to cross-site attacks. 
And besides, the variable "level" may also have such vulnerability.

 

(org.apache.hadoop.hbase.http.log.LogLevel.java Line 111/118)

Linkage to the code is here:

https://github.com/apache/hbase/blob/9e9b347d667e1fc6165c9f8ae5ae7052147e8895/hbase-http/src/main/java/org/apache/hadoop/hbase/http/log/LogLevel.java#L111

 

SourceBrella inc.


> A potential security issue in org.apache.hadoop.hbase.http.log.LogLevel.java
> 
>
> Key: HBASE-20339
> URL: https://issues.apache.org/jira/browse/HBASE-20339
> Project: HBase
>  Issue Type: Bug
>  Components: hbase
>Affects Versions: 2.0.0-beta-2
>Reporter: andy zhou
>Priority: Major
>
> Our program analyzer has detected a potential security issue as follows 
> {code:java}
> PrintWriter out = ServletUtil.initHTML(response, "Log Level");
> String logName = ServletUtil.getParameter(request, "log");
> String level = ServletUtil.getParameter(request, "level");
> if (logName != null) {
>out.println("Results");
>out.println(MARKER
> + "Submitted Log Name: " + logName + "");
>   ...
> }{code}
> Above is the code piece. Seems that the log name is directly collected from 
> the web request, and only whether the data is null is checked. So an attacker 
> may provide a "logName" with a piece of injected code, leading to cross-site 
> attacks. And besides, the variable "level" may also have such vulnerability.
>  
> (org.apache.hadoop.hbase.http.log.LogLevel.java Line 111/118)
> Linkage to the code is here:
> [https://github.com/apache/hbase/blob/9e9b347d667e1fc6165c9f8ae5ae7052147e8895/hbase-http/src/main/java/org/apache/hadoop/hbase/http/log/LogLevel.java#L111]
>  
> SourceBrella inc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20339) A potential security issue in org.apache.hadoop.hbase.http.log.LogLevel.java

2018-04-03 Thread andy zhou (JIRA)

andy zhou created HBASE-20339:
-

 Summary: A potential security issue in 
org.apache.hadoop.hbase.http.log.LogLevel.java
 Key: HBASE-20339
 URL: https://issues.apache.org/jira/browse/HBASE-20339
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 2.0.0-beta-2
Reporter: andy zhou


Our program analyzer have detected a potential security issue as follows 
{code:java}
PrintWriter out = ServletUtil.initHTML(response, "Log Level");
String logName = ServletUtil.getParameter(request, "log");
String level = ServletUtil.getParameter(request, "level");

if (logName != null) {
   out.println("Results");
   out.println(MARKER
+ "Submitted Log Name: " + logName + "");
  ...
}{code}
Above is the code piece. Seems that the log name is directly collected from the 
web request, and only whether the data is null is checked. So an attacker may 
provide a logName with a piece of injected code leading to cross-site attacks. 
And besides, the variable "level" may also have such vulnerability.

 

(org.apache.hadoop.hbase.http.log.LogLevel.java Line 111/118)

Linkage to the code is here:

https://github.com/apache/hbase/blob/9e9b347d667e1fc6165c9f8ae5ae7052147e8895/hbase-http/src/main/java/org/apache/hadoop/hbase/http/log/LogLevel.java#L111

 

SourceBrella inc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425075#comment-16425075
 ] 

stack commented on HBASE-20188:
---

[~eshcar] Did you use hbase defaults or did you change segment count or flush 
size from default? I see NONE does better than BASIC always. Thanks.

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425066#comment-16425066
 ] 

stack commented on HBASE-20188:
---

I added a third sheet name "Short Circuit Reads 25M Run" at [1] with timings 
with short-circuit read in place for hbase1 and hbase2. Here are findings:

{quote}
Findings: hbase1.x performs better than 2.x in pure read and pure write modes 
(but we are now within 10%). Mixed load (workloada 50/50), hbase2 is better no 
matter what combination. FastPath RPC Scheduler, the default for hbase2 is 
better than the hbase1 RCP scheduler though it looks ugly in the thread dumps 
with all threads seemingly backed up on its Semaphore coordinator.  hbase2 uses 
more CPU but seems to have a flatter GC profile. in-memory compaction costs. 
For load, no-in-memory compaction is 4% slower than hbase1, but with in-memory 
compaction, it is 9% slower. For workloada, no-in-memory-compaction is 25% 
faster than hbase1 and with in-memory compaction, 17% faster. For workloadc, 
with no-in-memory compaction, we are 2% slower. With it, we are 5% slower.
{quote}

1. 
https://docs.google.com/spreadsheets/d/1w2NBqAPFthG8Ib4C0pHpLARYpWoIF2Vck2vHZW77zE4/edit#gid=1651250875

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-16499) slow replication for small HBase clusters

2018-04-03 Thread Ashish Singhi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425041#comment-16425041
 ] 

Ashish Singhi commented on HBASE-16499:
---

[~busbey], can you please review the addendum so that I can commit the same 
after your blessings.

> slow replication for small HBase clusters
> -
>
> Key: HBASE-16499
> URL: https://issues.apache.org/jira/browse/HBASE-16499
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Vikas Vishwakarma
>Assignee: Ashish Singhi
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-16499-addendum.patch, HBASE-16499.patch, 
> HBASE-16499.patch
>
>
> For small clusters 10-20 nodes we recently observed that replication is 
> progressing very slowly when we do bulk writes and there is lot of lag 
> accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed 
> that the number of threads used for shipping wal edits in parallel comes from 
> the following equation in HBaseInterClusterReplicationEndpoint
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
>   replicationSinkMgr.getSinks().size());
> ... 
>   for (int i=0; i entryLists.add(new ArrayList(entries.size()/n+1)); <-- 
> batch size
>   }
> ...
> for (int i=0; i  .
> // RuntimeExceptions encountered here bubble up and are handled 
> in ReplicationSource
> pool.submit(createReplicator(entryLists.get(i), i));  <-- 
> concurrency 
> futures++;
>   }
> }
> maxThreads is fixed & configurable and since we are taking min of the three 
> values n gets decided based replicationSinkMgr.getSinks().size() when we have 
> enough edits to replicate
> replicationSinkMgr.getSinks().size() is decided based on 
> int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
> where ratio is this.ratio = conf.getFloat("replication.source.ratio", 
> DEFAULT_REPLICATION_SOURCE_RATIO);
> Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small 
> clusters of size 10-20 RegionServers  the value we get for numSinks and hence 
> n is very small like 1 or 2. This substantially reduces the pool concurrency 
> used for shipping wal edits in parallel effectively slowing down replication 
> for small clusters and causing lot of lag accumulation in AgeOfLastShipped. 
> Sometimes it takes tens of hours to clear off the entire replication queue 
> even after the client has finished writing on the source side. 
> We are running tests by varying replication.source.ratio and have seen 
> multi-fold improvement in total replication time (will update the results 
> here). I wanted to propose here that we should increase the default value for 
> replication.source.ratio also so that we have sufficient concurrency even for 
> small clusters. We figured it out after lot of iterations and debugging so 
> probably slightly higher default will save the trouble. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-16499) slow replication for small HBase clusters

2018-04-03 Thread Ashish Singhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-16499:
--
Attachment: HBASE-16499-addendum.patch

> slow replication for small HBase clusters
> -
>
> Key: HBASE-16499
> URL: https://issues.apache.org/jira/browse/HBASE-16499
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Vikas Vishwakarma
>Assignee: Ashish Singhi
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-16499-addendum.patch, HBASE-16499.patch, 
> HBASE-16499.patch
>
>
> For small clusters 10-20 nodes we recently observed that replication is 
> progressing very slowly when we do bulk writes and there is lot of lag 
> accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed 
> that the number of threads used for shipping wal edits in parallel comes from 
> the following equation in HBaseInterClusterReplicationEndpoint
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
>   replicationSinkMgr.getSinks().size());
> ... 
>   for (int i=0; i entryLists.add(new ArrayList(entries.size()/n+1)); <-- 
> batch size
>   }
> ...
> for (int i=0; i  .
> // RuntimeExceptions encountered here bubble up and are handled 
> in ReplicationSource
> pool.submit(createReplicator(entryLists.get(i), i));  <-- 
> concurrency 
> futures++;
>   }
> }
> maxThreads is fixed & configurable and since we are taking min of the three 
> values n gets decided based replicationSinkMgr.getSinks().size() when we have 
> enough edits to replicate
> replicationSinkMgr.getSinks().size() is decided based on 
> int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
> where ratio is this.ratio = conf.getFloat("replication.source.ratio", 
> DEFAULT_REPLICATION_SOURCE_RATIO);
> Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small 
> clusters of size 10-20 RegionServers  the value we get for numSinks and hence 
> n is very small like 1 or 2. This substantially reduces the pool concurrency 
> used for shipping wal edits in parallel effectively slowing down replication 
> for small clusters and causing lot of lag accumulation in AgeOfLastShipped. 
> Sometimes it takes tens of hours to clear off the entire replication queue 
> even after the client has finished writing on the source side. 
> We are running tests by varying replication.source.ratio and have seen 
> multi-fold improvement in total replication time (will update the results 
> here). I wanted to propose here that we should increase the default value for 
> replication.source.ratio also so that we have sufficient concurrency even for 
> small clusters. We figured it out after lot of iterations and debugging so 
> probably slightly higher default will save the trouble. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-16499) slow replication for small HBase clusters

2018-04-03 Thread Ashish Singhi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425037#comment-16425037
 ] 

Ashish Singhi commented on HBASE-16499:
---

Sure will add that immediately.. Sorry for not doing it before, I was not aware 
of it!

> slow replication for small HBase clusters
> -
>
> Key: HBASE-16499
> URL: https://issues.apache.org/jira/browse/HBASE-16499
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Vikas Vishwakarma
>Assignee: Ashish Singhi
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-16499.patch, HBASE-16499.patch
>
>
> For small clusters 10-20 nodes we recently observed that replication is 
> progressing very slowly when we do bulk writes and there is lot of lag 
> accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed 
> that the number of threads used for shipping wal edits in parallel comes from 
> the following equation in HBaseInterClusterReplicationEndpoint
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
>   replicationSinkMgr.getSinks().size());
> ... 
>   for (int i=0; i entryLists.add(new ArrayList(entries.size()/n+1)); <-- 
> batch size
>   }
> ...
> for (int i=0; i  .
> // RuntimeExceptions encountered here bubble up and are handled 
> in ReplicationSource
> pool.submit(createReplicator(entryLists.get(i), i));  <-- 
> concurrency 
> futures++;
>   }
> }
> maxThreads is fixed & configurable and since we are taking min of the three 
> values n gets decided based replicationSinkMgr.getSinks().size() when we have 
> enough edits to replicate
> replicationSinkMgr.getSinks().size() is decided based on 
> int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
> where ratio is this.ratio = conf.getFloat("replication.source.ratio", 
> DEFAULT_REPLICATION_SOURCE_RATIO);
> Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small 
> clusters of size 10-20 RegionServers  the value we get for numSinks and hence 
> n is very small like 1 or 2. This substantially reduces the pool concurrency 
> used for shipping wal edits in parallel effectively slowing down replication 
> for small clusters and causing lot of lag accumulation in AgeOfLastShipped. 
> Sometimes it takes tens of hours to clear off the entire replication queue 
> even after the client has finished writing on the source side. 
> We are running tests by varying replication.source.ratio and have seen 
> multi-fold improvement in total replication time (will update the results 
> here). I wanted to propose here that we should increase the default value for 
> replication.source.ratio also so that we have sufficient concurrency even for 
> small clusters. We figured it out after lot of iterations and debugging so 
> probably slightly higher default will save the trouble. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-16499) slow replication for small HBase clusters

2018-04-03 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reopened HBASE-16499:
-

reopening, please add to the 2.0 upgrade section that calls out changes in 
default values to note this one.

> slow replication for small HBase clusters
> -
>
> Key: HBASE-16499
> URL: https://issues.apache.org/jira/browse/HBASE-16499
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Vikas Vishwakarma
>Assignee: Ashish Singhi
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-16499.patch, HBASE-16499.patch
>
>
> For small clusters 10-20 nodes we recently observed that replication is 
> progressing very slowly when we do bulk writes and there is lot of lag 
> accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed 
> that the number of threads used for shipping wal edits in parallel comes from 
> the following equation in HBaseInterClusterReplicationEndpoint
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
>   replicationSinkMgr.getSinks().size());
> ... 
>   for (int i=0; i entryLists.add(new ArrayList(entries.size()/n+1)); <-- 
> batch size
>   }
> ...
> for (int i=0; i  .
> // RuntimeExceptions encountered here bubble up and are handled 
> in ReplicationSource
> pool.submit(createReplicator(entryLists.get(i), i));  <-- 
> concurrency 
> futures++;
>   }
> }
> maxThreads is fixed & configurable and since we are taking min of the three 
> values n gets decided based replicationSinkMgr.getSinks().size() when we have 
> enough edits to replicate
> replicationSinkMgr.getSinks().size() is decided based on 
> int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
> where ratio is this.ratio = conf.getFloat("replication.source.ratio", 
> DEFAULT_REPLICATION_SOURCE_RATIO);
> Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small 
> clusters of size 10-20 RegionServers  the value we get for numSinks and hence 
> n is very small like 1 or 2. This substantially reduces the pool concurrency 
> used for shipping wal edits in parallel effectively slowing down replication 
> for small clusters and causing lot of lag accumulation in AgeOfLastShipped. 
> Sometimes it takes tens of hours to clear off the entire replication queue 
> even after the client has finished writing on the source side. 
> We are running tests by varying replication.source.ratio and have seen 
> multi-fold improvement in total replication time (will update the results 
> here). I wanted to propose here that we should increase the default value for 
> replication.source.ratio also so that we have sufficient concurrency even for 
> small clusters. We figured it out after lot of iterations and debugging so 
> probably slightly higher default will save the trouble. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20337) Update the doc on how to setup shortcircuit reads; its stale

2018-04-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425020#comment-16425020
 ] 

Hadoop QA commented on HBASE-20337:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
52s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}186m 
15s{color} | {color:green} root in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
27s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}202m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20337 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917410/HBASE-20337.master.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  |
| uname | Linux 4c82fd4b8651 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / b1b0db3195 |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12294/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12294/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12294/artifact/patchprocess/patch-asflicense-problems.txt
 |
| Max. process+thread count | 4297 (vs. ulimit of 1) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12294/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Update the doc on how to setup shortcircuit reads; its stale
> 
>
> Key: HBASE-20337
> URL: https://issues.apache.org/jira/browse/HBASE-20337
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20337.master.001.patch, 
> HBASE-20337.master.002.patch
>
>
> The doc is from another era. Update it. Short-circuit reads can make a big 
> difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425019#comment-16425019
 ] 

stack edited comment on HBASE-20188 at 4/4/18 5:14 AM:
---

[~ram_krish] Setting this on the client-side


  dfs.domain.socket.path
  /home/stack/sockets/stack_dn_socket
  
This configuration parameter turns on short-circuit local reads.
  

See paragraph above and the flamegraphs for diff between hbase1 and hbase2 w/o 
the above.

dfs.client.read.shortcircuit.skip.checksum makes sense. Let me try it here and 
see if it helps. Let me add it to the doc over on HBASE-20337.

Do you recall what prompted your upping of the  
'dfs.client.read.shortcircuit.streams.cache.size' and 
'dfs.client.socketcache.capacity' values? Lets get that into HBASE-20337 too.

You have "... We have done some detalied study on the effect of short circuit 
reads and have our analysis on it." Is it available anywhere boss?


was (Author: stack):
[~ram_krish] Setting this on the client-side


  dfs.domain.socket.path
  /home/stack/sockets/stack_dn_socket
  
This configuration parameter turns on short-circuit local reads.
  

dfs.client.read.shortcircuit.skip.checksum makes sense. Let me try it here and 
see if it helps. Let me add it to the doc over on HBASE-20337.

Do you recall what prompted your upping of the  
'dfs.client.read.shortcircuit.streams.cache.size' and 
'dfs.client.socketcache.capacity' values? Lets get that into HBASE-20337 too.

You have "... We have done some detalied study on the effect of short circuit 
reads and have our analysis on it." Is it available anywhere boss?

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425019#comment-16425019
 ] 

stack commented on HBASE-20188:
---

[~ram_krish] Setting this on the client-side


  dfs.domain.socket.path
  /home/stack/sockets/stack_dn_socket
  
This configuration parameter turns on short-circuit local reads.
  

dfs.client.read.shortcircuit.skip.checksum makes sense. Let me try it here and 
see if it helps. Let me add it to the doc over on HBASE-20337.

Do you recall what prompted your upping of the  
'dfs.client.read.shortcircuit.streams.cache.size' and 
'dfs.client.socketcache.capacity' values? Lets get that into HBASE-20337 too.

You have "... We have done some detalied study on the effect of short circuit 
reads and have our analysis on it." Is it available anywhere boss?

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20149) Purge dev javadoc from bin tarball (or make a separate tarball of javadoc)

2018-04-03 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425014#comment-16425014
 ] 

Sean Busbey commented on HBASE-20149:
-

and optionally we could use a different descriptor to make a tarball that just 
has those reports (or maybe the whole site)

> Purge dev javadoc from bin tarball (or make a separate tarball of javadoc)
> --
>
> Key: HBASE-20149
> URL: https://issues.apache.org/jira/browse/HBASE-20149
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, community, documentation
>Reporter: stack
>Assignee: Artem Ervits
>Priority: Critical
> Fix For: 2.0.0
>
>
> The bin tarball is too fat (Chia-Ping and Josh noticed it on the beta-2 
> vote). A note to the dev list subsequently resulted in suggestion that we 
> just purge dev javadoc (or even all javadoc) from bin tarball (Andrew). Sean 
> was good w/ it and suggested perhaps we could do a javadoc only tgz. Let me 
> look into this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20149) Purge dev javadoc from bin tarball (or make a separate tarball of javadoc)

2018-04-03 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425013#comment-16425013
 ] 

Sean Busbey commented on HBASE-20149:
-

I think the change needs to be done to the assembly descriptor we use to build 
our binary tarball ({{hbase-assembly/src/main/assembly/components.xml}}). The 
section on the site docs needs to have an exclusion node added to keep out the 
4 javadoc reports.

> Purge dev javadoc from bin tarball (or make a separate tarball of javadoc)
> --
>
> Key: HBASE-20149
> URL: https://issues.apache.org/jira/browse/HBASE-20149
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, community, documentation
>Reporter: stack
>Assignee: Artem Ervits
>Priority: Critical
> Fix For: 2.0.0
>
>
> The bin tarball is too fat (Chia-Ping and Josh noticed it on the beta-2 
> vote). A note to the dev list subsequently resulted in suggestion that we 
> just purge dev javadoc (or even all javadoc) from bin tarball (Andrew). Sean 
> was good w/ it and suggested perhaps we could do a javadoc only tgz. Let me 
> look into this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20243) [Shell] Add shell command to create a new table by cloning the existent table

2018-04-03 Thread Guangxu Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425010#comment-16425010
 ] 

Guangxu Cheng commented on HBASE-20243:
---

Attach 006 patch as [~appy] suggestions.Thanks

> [Shell] Add shell command to create a new table by cloning the existent table
> -
>
> Key: HBASE-20243
> URL: https://issues.apache.org/jira/browse/HBASE-20243
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: HBASE-20243.master.001.patch, 
> HBASE-20243.master.002.patch, HBASE-20243.master.003.patch, 
> HBASE-20243.master.004.patch, HBASE-20243.master.005.patch, 
> HBASE-20243.master.006.patch
>
>
> In the production environment, we need to create a new table every day. The 
> schema and the split keys of the table are the same as that of yesterday's 
> table, only the name of the table is different. For example, 
> x_20180321,x_20180322 etc.But now there is no convenient command to 
> do this. So we may need such a command(clone_table) to create a new table by 
> cloning the existent table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20231) Not able to delete column family from a row using RemoteHTable

2018-04-03 Thread Ashish Singhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-20231:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0
   1.4.4
   Status: Resolved  (was: Patch Available)

> Not able to delete column family from a row using RemoteHTable
> --
>
> Key: HBASE-20231
> URL: https://issues.apache.org/jira/browse/HBASE-20231
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0, 1.4.4, 2.0.0
>
> Attachments: HBASE-20231-branch-1-v2.patch, 
> HBASE-20231-branch-1-v3.patch, HBASE-20231-branch-1.patch, 
> HBASE-20231-v2.patch, HBASE-20231-v3.patch, HBASE-20231.patch
>
>
> Example code to reproduce the issue,
> {code:java}
> Cluster cluster = new Cluster();
> cluster.add("rest-server-IP", rest-server-port);
> Client client = new Client(cluster);
> RemoteHTable table = new RemoteHTable(client, "t1");
> // Insert few records,
> Put put = new Put(Bytes.toBytes("r1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c2"), Bytes.toBytes("c2"));
> put.add(Bytes.toBytes("cf2"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> table.put(put);
> put = new Put(Bytes.toBytes("r2"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c2"), Bytes.toBytes("c2"));
> put.add(Bytes.toBytes("cf2"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> table.put(put);
> // Delete the entire column family from the row
> Delete del = new Delete(Bytes.toBytes("r2"));
> del.addFamily(Bytes.toBytes("cf1"));
> table.delete(del);
> {code}
> Here the problem is in building row specification in 
> RemoteHTable.buildRowSpec(). Row specification is framed as "/t1/r2/cf1:" 
> instead of "/t1/r2/cf1". 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20243) [Shell] Add shell command to create a new table by cloning the existent table

2018-04-03 Thread Guangxu Cheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangxu Cheng updated HBASE-20243:
--
Attachment: HBASE-20243.master.006.patch

> [Shell] Add shell command to create a new table by cloning the existent table
> -
>
> Key: HBASE-20243
> URL: https://issues.apache.org/jira/browse/HBASE-20243
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: HBASE-20243.master.001.patch, 
> HBASE-20243.master.002.patch, HBASE-20243.master.003.patch, 
> HBASE-20243.master.004.patch, HBASE-20243.master.005.patch, 
> HBASE-20243.master.006.patch
>
>
> In the production environment, we need to create a new table every day. The 
> schema and the split keys of the table are the same as that of yesterday's 
> table, only the name of the table is different. For example, 
> x_20180321,x_20180322 etc.But now there is no convenient command to 
> do this. So we may need such a command(clone_table) to create a new table by 
> cloning the existent table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-20231) Not able to delete column family from a row using RemoteHTable

2018-04-03 Thread Ashish Singhi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425008#comment-16425008
 ] 

Ashish Singhi edited comment on HBASE-20231 at 4/4/18 4:50 AM:
---

Thanks [~pankaj2461] for the patches and [~yuzhih...@gmail.com], [~uagashe] and 
[~chia7712] for the reviews.

I have pushed this change to branch-1.4+. [~pankaj2461], if you attach patch 
for branch-1.2 and branch-1.3, I will commit there as well.


was (Author: ashish singhi):
Thanks [~pankaj2461] for the patches and [~yuzhih...@gmail.com], [~uagashe] and 
[~chia7712] for the reviews.

I have pushed this change to branch-1.4+. If you attach patch for branch-1.2 
and branch-1.3, I will commit there as well.

> Not able to delete column family from a row using RemoteHTable
> --
>
> Key: HBASE-20231
> URL: https://issues.apache.org/jira/browse/HBASE-20231
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-20231-branch-1-v2.patch, 
> HBASE-20231-branch-1-v3.patch, HBASE-20231-branch-1.patch, 
> HBASE-20231-v2.patch, HBASE-20231-v3.patch, HBASE-20231.patch
>
>
> Example code to reproduce the issue,
> {code:java}
> Cluster cluster = new Cluster();
> cluster.add("rest-server-IP", rest-server-port);
> Client client = new Client(cluster);
> RemoteHTable table = new RemoteHTable(client, "t1");
> // Insert few records,
> Put put = new Put(Bytes.toBytes("r1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c2"), Bytes.toBytes("c2"));
> put.add(Bytes.toBytes("cf2"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> table.put(put);
> put = new Put(Bytes.toBytes("r2"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c2"), Bytes.toBytes("c2"));
> put.add(Bytes.toBytes("cf2"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> table.put(put);
> // Delete the entire column family from the row
> Delete del = new Delete(Bytes.toBytes("r2"));
> del.addFamily(Bytes.toBytes("cf1"));
> table.delete(del);
> {code}
> Here the problem is in building row specification in 
> RemoteHTable.buildRowSpec(). Row specification is framed as "/t1/r2/cf1:" 
> instead of "/t1/r2/cf1". 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20231) Not able to delete column family from a row using RemoteHTable

2018-04-03 Thread Ashish Singhi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425008#comment-16425008
 ] 

Ashish Singhi commented on HBASE-20231:
---

Thanks [~pankaj2461] for the patches and [~yuzhih...@gmail.com], [~uagashe] and 
[~chia7712] for the reviews.

I have pushed this change to branch-1.4+. If you attach patch for branch-1.2 
and branch-1.3, I will commit there as well.

> Not able to delete column family from a row using RemoteHTable
> --
>
> Key: HBASE-20231
> URL: https://issues.apache.org/jira/browse/HBASE-20231
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-20231-branch-1-v2.patch, 
> HBASE-20231-branch-1-v3.patch, HBASE-20231-branch-1.patch, 
> HBASE-20231-v2.patch, HBASE-20231-v3.patch, HBASE-20231.patch
>
>
> Example code to reproduce the issue,
> {code:java}
> Cluster cluster = new Cluster();
> cluster.add("rest-server-IP", rest-server-port);
> Client client = new Client(cluster);
> RemoteHTable table = new RemoteHTable(client, "t1");
> // Insert few records,
> Put put = new Put(Bytes.toBytes("r1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c2"), Bytes.toBytes("c2"));
> put.add(Bytes.toBytes("cf2"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> table.put(put);
> put = new Put(Bytes.toBytes("r2"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> put.add(Bytes.toBytes("cf1"), Bytes.toBytes("c2"), Bytes.toBytes("c2"));
> put.add(Bytes.toBytes("cf2"), Bytes.toBytes("c1"), Bytes.toBytes("c1"));
> table.put(put);
> // Delete the entire column family from the row
> Delete del = new Delete(Bytes.toBytes("r2"));
> del.addFamily(Bytes.toBytes("cf1"));
> table.delete(del);
> {code}
> Here the problem is in building row specification in 
> RemoteHTable.buildRowSpec(). Row specification is framed as "/t1/r2/cf1:" 
> instead of "/t1/r2/cf1". 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-20159:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to branch-2, thanks!

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.addendum.v2.patch, HBASE-20159.branch-2.patch, 
> HBASE-20159.patch, HBASE-20159.v2.patch, HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-16499) slow replication for small HBase clusters

2018-04-03 Thread Ashish Singhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-16499:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: Changed the default value for replication.source.ratio from 
0.1 to 0.5. Which means now by default 50% of the total RegionServers in peer 
cluster(s) will participate in replication.
  Status: Resolved  (was: Patch Available)

Thanks [~stack] for the review.

I have pushed this to branch-2.0+

> slow replication for small HBase clusters
> -
>
> Key: HBASE-16499
> URL: https://issues.apache.org/jira/browse/HBASE-16499
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Vikas Vishwakarma
>Assignee: Ashish Singhi
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-16499.patch, HBASE-16499.patch
>
>
> For small clusters 10-20 nodes we recently observed that replication is 
> progressing very slowly when we do bulk writes and there is lot of lag 
> accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed 
> that the number of threads used for shipping wal edits in parallel comes from 
> the following equation in HBaseInterClusterReplicationEndpoint
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
>   replicationSinkMgr.getSinks().size());
> ... 
>   for (int i=0; i entryLists.add(new ArrayList(entries.size()/n+1)); <-- 
> batch size
>   }
> ...
> for (int i=0; i  .
> // RuntimeExceptions encountered here bubble up and are handled 
> in ReplicationSource
> pool.submit(createReplicator(entryLists.get(i), i));  <-- 
> concurrency 
> futures++;
>   }
> }
> maxThreads is fixed & configurable and since we are taking min of the three 
> values n gets decided based replicationSinkMgr.getSinks().size() when we have 
> enough edits to replicate
> replicationSinkMgr.getSinks().size() is decided based on 
> int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
> where ratio is this.ratio = conf.getFloat("replication.source.ratio", 
> DEFAULT_REPLICATION_SOURCE_RATIO);
> Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small 
> clusters of size 10-20 RegionServers  the value we get for numSinks and hence 
> n is very small like 1 or 2. This substantially reduces the pool concurrency 
> used for shipping wal edits in parallel effectively slowing down replication 
> for small clusters and causing lot of lag accumulation in AgeOfLastShipped. 
> Sometimes it takes tens of hours to clear off the entire replication queue 
> even after the client has finished writing on the source side. 
> We are running tests by varying replication.source.ratio and have seen 
> multi-fold improvement in total replication time (will update the results 
> here). I wanted to propose here that we should increase the default value for 
> replication.source.ratio also so that we have sufficient concurrency even for 
> small clusters. We figured it out after lot of iterations and debugging so 
> probably slightly higher default will save the trouble. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424993#comment-16424993
 ] 

ramkrishna.s.vasudevan commented on HBASE-20188:


What was the config that was missing for short circuit reads in hbase-2? In our 
recent R + W experiments we had to enable short circuit reads for our tests 
because without that we had Port issues because the underlying drives were 
faster and so every local read trying to establiish the TCP was failing and 
retrying. (thus adding to latency).

When the number of threads were lesser we did not get this issue. With short 
circuit reads this problem was not there but we had to explicitly enable it. We 
have done some detalied study on the effect of short circuit reads and have our 
analysis on it.

How is hbase1 running with short circuit by itself? 

Another important thing Anoop found while doing those test was this config 
{code:java}


dfs.client.read.shortcircuit.skip.checksum
true

{code}
We had to enable it because in hbase we does its own checksums. 

Also we increased the '>dfs.client.read.shortcircuit.streams.cache.size' and 
'dfs.client.socketcache.capacity'

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20299) Update MOB in hbase refguide

2018-04-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424989#comment-16424989
 ] 

Hudson commented on HBASE-20299:


Results for branch branch-2.0
[build #127 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//JDK8_Nightly_Build_Report_(Hadoop3)/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Update MOB in hbase refguide
> 
>
> Key: HBASE-20299
> URL: https://issues.apache.org/jira/browse/HBASE-20299
> Project: HBase
>  Issue Type: Task
>  Components: mob
>Affects Versions: 2.0.0-beta-2
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-20299-master-v001.patch, 
> HBASE-20299-master-v002.patch
>
>
> MOB section in hbase refguide needs to be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-17730) [DOC] Migration to 2.0 for coprocessors

2018-04-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424990#comment-16424990
 ] 

Hudson commented on HBASE-17730:


Results for branch branch-2.0
[build #127 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//JDK8_Nightly_Build_Report_(Hadoop3)/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> [DOC] Migration to 2.0 for coprocessors 
> 
>
> Key: HBASE-17730
> URL: https://issues.apache.org/jira/browse/HBASE-17730
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, migration
>Reporter: Appy
>Assignee: Appy
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17730.master.001.patch
>
>
> Jiras breaking coprocessor compatibility should be marked with component ' 
> Coprocessor', and label 'incompatible'.
> Close to releasing 2.0, we should go through all such jiras and write down 
> steps for migrating coprocessor easily.
> The idea is, it might be very hard to fix coprocessor breakages by reverse 
> engineering errors,  but will be easier we suggest easiest way to fix 
> breakages resulting from each individual incompatible change.
> For eg. HBASE-17312 is incompatible change. It'll result in 100s of errors 
> because BaseXXXObserver classes are gone and will probably result in a lot of 
> confusion, but if we explicitly mention the fix which is just one line change 
> - replace "Foo extends BaseXXXObserver" with "Foo implements XXXObserver" - 
> it makes it very easy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20298) Doc change in read/write/total accounting metrics

2018-04-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424991#comment-16424991
 ] 

Hudson commented on HBASE-20298:


Results for branch branch-2.0
[build #127 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/127//JDK8_Nightly_Build_Report_(Hadoop3)/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Doc change in read/write/total accounting metrics
> -
>
> Key: HBASE-20298
> URL: https://issues.apache.org/jira/browse/HBASE-20298
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20298.master.001.patch, 
> HBASE-20298.master.002.patch, HBASE-20298.master.003.patch, 
> HBASE-20298.master.004.patch
>
>
> Doc the change wrought by the parent issue. Get it up into the refguide as 
> part of the difference between old hbases and hbase2.
> The change confused me and took me a while to untangle.
> The read count is for reads that return a non-empty result now. In old 
> hbase1, we'd increment the read-count even if an empty result. This makes 
> reads look bad in YCSB runs when compared to hbase1 (see how 
> totalRequestCount in hbase2 can be way above the sum of reads+writes; it is 
> because it increments even if the row is not found).
> Let me get this into refguide otherwise poor old operators will be baffled. 
> The release note on the parent is great; it just needs to be in our guide.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20219) An error occurs when scanning with reversed=true and loadColumnFamiliesOnDemand=true

2018-04-03 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424978#comment-16424978
 ] 

Toshihiro Suzuki commented on HBASE-20219:
--

I think we can only add a server side check for this issue here. I just 
attached a rebased patch. Could anyone please review this patch?

> An error occurs when scanning with reversed=true and 
> loadColumnFamiliesOnDemand=true
> 
>
> Key: HBASE-20219
> URL: https://issues.apache.org/jira/browse/HBASE-20219
> Project: HBase
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-20219-UT.patch, HBASE-20219.master.001.patch, 
> HBASE-20219.master.002.patch, HBASE-20219.master.003.patch, 
> HBASE-20219.master.004.patch
>
>
> I'm facing the following error when scanning with reversed=true and 
> loadColumnFamiliesOnDemand=true:
> {code}
> java.lang.IllegalStateException: requestSeek cannot be called on 
> ReversedKeyValueHeap
>   at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.requestSeek(ReversedKeyValueHeap.java:66)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.joinedHeapMayHaveData(HRegion.java:6725)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6652)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6364)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3108)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3345)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41548)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> I will attach a UT patch to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore

2018-04-03 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424976#comment-16424976
 ] 

Reid Chan commented on HBASE-18309:
---

I'm ok with this.

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18309.addendum.patch, 
> HBASE-18309.branch-1.001.patch, HBASE-18309.branch-1.002.patch, 
> HBASE-18309.branch-1.003.patch, HBASE-18309.branch-1.004.patch, 
> HBASE-18309.branch-1.005.patch, HBASE-18309.branch-1.006.patch, 
> HBASE-18309.master.001.patch, HBASE-18309.master.002.patch, 
> HBASE-18309.master.004.patch, HBASE-18309.master.005.patch, 
> HBASE-18309.master.006.patch, HBASE-18309.master.007.patch, 
> HBASE-18309.master.008.patch, HBASE-18309.master.009.patch, 
> HBASE-18309.master.010.patch, HBASE-18309.master.011.patch, 
> HBASE-18309.master.012.patch, space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20219) An error occurs when scanning with reversed=true and loadColumnFamiliesOnDemand=true

2018-04-03 Thread Toshihiro Suzuki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiro Suzuki updated HBASE-20219:
-
Attachment: HBASE-20219.master.004.patch

> An error occurs when scanning with reversed=true and 
> loadColumnFamiliesOnDemand=true
> 
>
> Key: HBASE-20219
> URL: https://issues.apache.org/jira/browse/HBASE-20219
> Project: HBase
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-20219-UT.patch, HBASE-20219.master.001.patch, 
> HBASE-20219.master.002.patch, HBASE-20219.master.003.patch, 
> HBASE-20219.master.004.patch
>
>
> I'm facing the following error when scanning with reversed=true and 
> loadColumnFamiliesOnDemand=true:
> {code}
> java.lang.IllegalStateException: requestSeek cannot be called on 
> ReversedKeyValueHeap
>   at 
> org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.requestSeek(ReversedKeyValueHeap.java:66)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.joinedHeapMayHaveData(HRegion.java:6725)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6652)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6364)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3108)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3345)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41548)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> I will attach a UT patch to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20299) Update MOB in hbase refguide

2018-04-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424963#comment-16424963
 ] 

Hudson commented on HBASE-20299:


Results for branch branch-2
[build #567 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//JDK8_Nightly_Build_Report_(Hadoop3)/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Update MOB in hbase refguide
> 
>
> Key: HBASE-20299
> URL: https://issues.apache.org/jira/browse/HBASE-20299
> Project: HBase
>  Issue Type: Task
>  Components: mob
>Affects Versions: 2.0.0-beta-2
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-20299-master-v001.patch, 
> HBASE-20299-master-v002.patch
>
>
> MOB section in hbase refguide needs to be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20298) Doc change in read/write/total accounting metrics

2018-04-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424965#comment-16424965
 ] 

Hudson commented on HBASE-20298:


Results for branch branch-2
[build #567 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//JDK8_Nightly_Build_Report_(Hadoop3)/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Doc change in read/write/total accounting metrics
> -
>
> Key: HBASE-20298
> URL: https://issues.apache.org/jira/browse/HBASE-20298
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20298.master.001.patch, 
> HBASE-20298.master.002.patch, HBASE-20298.master.003.patch, 
> HBASE-20298.master.004.patch
>
>
> Doc the change wrought by the parent issue. Get it up into the refguide as 
> part of the difference between old hbases and hbase2.
> The change confused me and took me a while to untangle.
> The read count is for reads that return a non-empty result now. In old 
> hbase1, we'd increment the read-count even if an empty result. This makes 
> reads look bad in YCSB runs when compared to hbase1 (see how 
> totalRequestCount in hbase2 can be way above the sum of reads+writes; it is 
> because it increments even if the row is not found).
> Let me get this into refguide otherwise poor old operators will be baffled. 
> The release note on the parent is great; it just needs to be in our guide.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-17730) [DOC] Migration to 2.0 for coprocessors

2018-04-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424964#comment-16424964
 ] 

Hudson commented on HBASE-17730:


Results for branch branch-2
[build #567 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/567//JDK8_Nightly_Build_Report_(Hadoop3)/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> [DOC] Migration to 2.0 for coprocessors 
> 
>
> Key: HBASE-17730
> URL: https://issues.apache.org/jira/browse/HBASE-17730
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, migration
>Reporter: Appy
>Assignee: Appy
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17730.master.001.patch
>
>
> Jiras breaking coprocessor compatibility should be marked with component ' 
> Coprocessor', and label 'incompatible'.
> Close to releasing 2.0, we should go through all such jiras and write down 
> steps for migrating coprocessor easily.
> The idea is, it might be very hard to fix coprocessor breakages by reverse 
> engineering errors,  but will be easier we suggest easiest way to fix 
> breakages resulting from each individual incompatible change.
> For eg. HBASE-17312 is incompatible change. It'll result in 100s of errors 
> because BaseXXXObserver classes are gone and will probably result in a lot of 
> confusion, but if we explicitly mention the fix which is just one line change 
> - replace "Foo extends BaseXXXObserver" with "Foo implements XXXObserver" - 
> it makes it very easy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-04-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424959#comment-16424959
 ] 

Hadoop QA commented on HBASE-19572:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  1m 
23s{color} | {color:red} branch has 7 errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  0m 
11s{color} | {color:red} patch has 7 errors when building our shaded downstream 
artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}115m 
48s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-19572 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917475/HBASE-19572.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d73ed28a0737 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / b1b0db3195 |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12292/testReport/ |
| Max. process+thread count | 4269 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12292/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> RegionMover should use the co

[jira] [Commented] (HBASE-20337) Update the doc on how to setup shortcircuit reads; its stale

2018-04-03 Thread Yu Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424955#comment-16424955
 ] 

Yu Li commented on HBASE-20337:
---

+1, nice doc boss.

> Update the doc on how to setup shortcircuit reads; its stale
> 
>
> Key: HBASE-20337
> URL: https://issues.apache.org/jira/browse/HBASE-20337
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20337.master.001.patch, 
> HBASE-20337.master.002.patch
>
>
> The doc is from another era. Update it. Short-circuit reads can make a big 
> difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore

2018-04-03 Thread Yu Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424939#comment-16424939
 ] 

Yu Li commented on HBASE-18309:
---

Again I'd suggest to open a new JIRA for the backport-to-branch-1 task. This 
improvement is already included in our 2.0.0-beta-1 release and I don't think 
there's any good reason to reopen it for a backport work.

What's more, I think for branch-1 the release note needs some refinement since 
there's no {{hbase.regionserver.hfilecleaner.large.thread.count}} and 
{{hbase.regionserver.hfilecleaner.small.thread.count}} there.

Please let me know your thoughts [~zyork] [~reidchan]. Thanks.

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18309.addendum.patch, 
> HBASE-18309.branch-1.001.patch, HBASE-18309.branch-1.002.patch, 
> HBASE-18309.branch-1.003.patch, HBASE-18309.branch-1.004.patch, 
> HBASE-18309.branch-1.005.patch, HBASE-18309.branch-1.006.patch, 
> HBASE-18309.master.001.patch, HBASE-18309.master.002.patch, 
> HBASE-18309.master.004.patch, HBASE-18309.master.005.patch, 
> HBASE-18309.master.006.patch, HBASE-18309.master.007.patch, 
> HBASE-18309.master.008.patch, HBASE-18309.master.009.patch, 
> HBASE-18309.master.010.patch, HBASE-18309.master.011.patch, 
> HBASE-18309.master.012.patch, space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps and/ or exponential backoff for retrying rollWriter()

2018-04-03 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424937#comment-16424937
 ] 

Wei-Chiu Chuang commented on HBASE-20338:
-

Hi Umesh, I'd like to work on this improvement. Thanks for filing the jira.

> WALProcedureStore#recoverLease() should have fixed sleeps and/ or exponential 
> backoff for retrying rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>

[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore

2018-04-03 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424938#comment-16424938
 ] 

Reid Chan commented on HBASE-18309:
---

{quote}can you put the branch-1 part in review board as well? Thanks
{quote}
[~zyork], fyi, 
[https://reviews.apache.org/r/66417/diff/1#index_heade|https://reviews.apache.org/r/66417/diff/1#index_header]

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18309.addendum.patch, 
> HBASE-18309.branch-1.001.patch, HBASE-18309.branch-1.002.patch, 
> HBASE-18309.branch-1.003.patch, HBASE-18309.branch-1.004.patch, 
> HBASE-18309.branch-1.005.patch, HBASE-18309.branch-1.006.patch, 
> HBASE-18309.master.001.patch, HBASE-18309.master.002.patch, 
> HBASE-18309.master.004.patch, HBASE-18309.master.005.patch, 
> HBASE-18309.master.006.patch, HBASE-18309.master.007.patch, 
> HBASE-18309.master.008.patch, HBASE-18309.master.009.patch, 
> HBASE-18309.master.010.patch, HBASE-18309.master.011.patch, 
> HBASE-18309.master.012.patch, space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps and/ or exponential backoff for retrying rollWriter()

2018-04-03 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HBASE-20338:
---

Assignee: Wei-Chiu Chuang

> WALProcedureStore#recoverLease() should have fixed sleeps and/ or exponential 
> backoff for retrying rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.c

[jira] [Commented] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Yu Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424928#comment-16424928
 ] 

Yu Li commented on HBASE-20159:
---

The v2 branch-2-addendum patch LGTM, thanks for noticing and taking care of 
this [~mdrob]

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.addendum.v2.patch, HBASE-20159.branch-2.patch, 
> HBASE-20159.patch, HBASE-20159.v2.patch, HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: (was: misses.127.workloadc.20180402T200918Z.svg)

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: lock.127.workloadc.20180402T200918Z.svg

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.127.workloadc.20180402T200918Z.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: (was: cpu.127.workloadc.20180402T200918Z.svg)

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: (was: cpu.127.workloadc.20180402T200918Z.svg)

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: (was: cpu.2.memsize2.c.20180403T160257Z.svg)

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.127.workloadc.20180402T200918Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424894#comment-16424894
 ] 

stack commented on HBASE-20188:
---

Fixing short-circuit reads config made a big difference to hbase2 read 
throughput putting it close to hbase-1.2.7. Let me update the report. hbase1 
seemed fine with having shortcircuit reads = true but hbase2 was complaining 
falling back on remote reads. The giveaway was the differing lock profiles. See 
attached locking profile flamegraph lock.127.workloadc.20180402T200918Z.svg for 
1.2.7 workloadc. Notice how we are blocking on the ShortCircuitCache cache 
inside in *local* BlockReader.

A run against hbase2 with same configurations had the locking profile 
lock.2.memsize2.c.20180403T160257Z.svg. There are a few things going on but we 
are sticking on PeerCache from *remote* BlockReader. Looking in hbase2 
regionserver logs, it seems like we ran fine for a while and then the 
shortcircuit cache would throw exceptions and hold up the handler a while. Our 
doc on short-circuit setup is stale. Updated it here HBASE-20337

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.127.workloadc.20180402T200918Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: cpu.127.workloadc.20180402T200918Z.svg

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.127.workloadc.20180402T200918Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: (was: cpu.127.workloadc.20180402T200918Z.svg)

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.127.workloadc.20180402T200918Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: cpu.127.workloadc.20180402T200918Z.svg

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Issue Comment Deleted] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Comment: was deleted

(was: Fixing short-circuit reads config made a big difference to hbase2 read 
throughput putting it close to hbase-1.2.7. Let me update the report. hbase1 
seemed fine with having shortcircuit reads = true but hbase2 was complaining 
falling back on remote reads. The giveaway was the differing lock profiles. 
Here is hbase1's locking profile for workloadc looked like:

[See below...]

Notice how we are blocking on the ShortCircuitCache cache inside in *local* 
BlockReader.

A run against hbase2 with same configurations had this locking profile:

 [^cpu.2.memsize2.c.20180403T160257Z.svg] 

There are a few things going on but we are sticking on PeerCache from *remote* 
BlockReader.

Looking in hbase2 regionserver logs, it seems like we ran fine for a while and 
then the shortcircuit cache would throw exceptions and hold up the handler a 
while. Our doc on short-circuit setup is stale. Updated it here HBASE-20337)

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: cpu.127.workloadc.20180402T200918Z.svg

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.127.workloadc.20180402T200918Z.svg, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424884#comment-16424884
 ] 

stack edited comment on HBASE-20188 at 4/4/18 2:25 AM:
---

Fixing short-circuit reads config made a big difference to hbase2 read 
throughput putting it close to hbase-1.2.7. Let me update the report. hbase1 
seemed fine with having shortcircuit reads = true but hbase2 was complaining 
falling back on remote reads. The giveaway was the differing lock profiles. 
Here is hbase1's locking profile for workloadc looked like:

[See below...]

Notice how we are blocking on the ShortCircuitCache cache inside in *local* 
BlockReader.

A run against hbase2 with same configurations had this locking profile:

 [^cpu.2.memsize2.c.20180403T160257Z.svg] 

There are a few things going on but we are sticking on PeerCache from *remote* 
BlockReader.

Looking in hbase2 regionserver logs, it seems like we ran fine for a while and 
then the shortcircuit cache would throw exceptions and hold up the handler a 
while. Our doc on short-circuit setup is stale. Updated it here HBASE-20337


was (Author: stack):
Fixing short-circuit reads config made a big difference to hbase2 read 
throughput putting it close to hbase-1.2.7. Let me update the report. hbase1 
seemed fine with having shortcircuit reads = true but hbase2 was complaining 
falling back on remote reads. The giveaway was the differing lock profiles. 
Here is hbase1's locking profile for workloadc looked like:

 [^lock.127.workloadc.20180402T200918Z.svg] 

Notice how we are blocking on the ShortCircuitCache cache inside in *local* 
BlockReader.

A run against hbase2 with same configurations had this locking profile:

 [^cpu.2.memsize2.c.20180403T160257Z.svg] 

There are a few things going on but we are sticking on PeerCache from *remote* 
BlockReader.

Looking in hbase2 regionserver logs, it seems like we ran fine for a while and 
then the shortcircuit cache would throw exceptions and hold up the handler a 
while. Our doc on short-circuit setup is stale. Updated it here HBASE-20337

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424884#comment-16424884
 ] 

stack edited comment on HBASE-20188 at 4/4/18 2:24 AM:
---

Fixing short-circuit reads config made a big difference to hbase2 read 
throughput putting it close to hbase-1.2.7. Let me update the report. hbase1 
seemed fine with having shortcircuit reads = true but hbase2 was complaining 
falling back on remote reads. The giveaway was the differing lock profiles. 
Here is hbase1's locking profile for workloadc looked like:

 [^lock.127.workloadc.20180402T200918Z.svg] 

Notice how we are blocking on the ShortCircuitCache cache inside in *local* 
BlockReader.

A run against hbase2 with same configurations had this locking profile:

 [^cpu.2.memsize2.c.20180403T160257Z.svg] 

There are a few things going on but we are sticking on PeerCache from *remote* 
BlockReader.

Looking in hbase2 regionserver logs, it seems like we ran fine for a while and 
then the shortcircuit cache would throw exceptions and hold up the handler a 
while. Our doc on short-circuit setup is stale. Updated it here HBASE-20337


was (Author: stack):
Fixing short-circuit reads config made a big difference to hbase2 read 
throughput putting it close to hbase-1.2.7. Let me update the report. hbase1 
seemed fine with having shortcircuit reads = true but hbase2 was complaining 
falling back on remote reads. The giveaway was the differing lock profiles. 
Here is hbase1's locking profile for workloadc looked like:

 [^misses.127.workloadc.20180402T200918Z.svg] 

Notice how we are blocking on the ShortCircuitCache cache inside in *local* 
BlockReader.

A run against hbase2 with same configurations had this locking profile:

 [^cpu.2.memsize2.c.20180403T160257Z.svg] 

There are a few things going on but we are sticking on PeerCache from *remote* 
BlockReader.

Looking in hbase2 regionserver logs, it seems like we ran fine for a while and 
then the shortcircuit cache would throw exceptions and hold up the handler a 
while. Our doc on short-circuit setup is stale. Updated it here HBASE-20337

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424884#comment-16424884
 ] 

stack commented on HBASE-20188:
---

Fixing short-circuit reads config made a big difference to hbase2 read 
throughput putting it close to hbase-1.2.7. Let me update the report. hbase1 
seemed fine with having shortcircuit reads = true but hbase2 was complaining 
falling back on remote reads. The giveaway was the differing lock profiles. 
Here is hbase1's locking profile for workloadc looked like:

 [^misses.127.workloadc.20180402T200918Z.svg] 

Notice how we are blocking on the ShortCircuitCache cache inside in *local* 
BlockReader.

A run against hbase2 with same configurations had this locking profile:

 [^cpu.2.memsize2.c.20180403T160257Z.svg] 

There are a few things going on but we are sticking on PeerCache from *remote* 
BlockReader.

Looking in hbase2 regionserver logs, it seems like we ran fine for a while and 
then the shortcircuit cache would throw exceptions and hold up the handler a 
while. Our doc on short-circuit setup is stale. Updated it here HBASE-20337

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20293) get_splits returns duplicate split points when region replication is on

2018-04-03 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424883#comment-16424883
 ] 

Toshihiro Suzuki commented on HBASE-20293:
--

Do I need to fix the rubocop and ruby-lint errors?

> get_splits returns duplicate split points when region replication is on
> ---
>
> Key: HBASE-20293
> URL: https://issues.apache.org/jira/browse/HBASE-20293
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Minor
> Attachments: HBASE-20293.master.001.patch
>
>
> When region replication is on, get_splits returns duplicate split points like 
> the following:
> {code}
> hbase(main):001:0> create "test", "cf", {REGION_REPLICATION => 3}, SPLITS => 
> ["10"]
> Created table test
> Took 1.0975 seconds
> hbase(main):002:0> get_splits "test"
> Total number of splits = 4
> 10
> 10
> 10
> Took 0.0941 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: lock.2.memsize2.c.20180403T160257Z.svg

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: cpu.2.memsize2.c.20180403T160257Z.svg

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> cpu.2.memsize2.c.20180403T160257Z.svg, flamegraph-1072.1.svg, 
> flamegraph-1072.2.svg, lock.2.memsize2.c.20180403T160257Z.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20188) [TESTING] Performance

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20188:
--
Attachment: misses.127.workloadc.20180402T200918Z.svg

> [TESTING] Performance
> -
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
>  Issue Type: Umbrella
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: CAM-CONFIG-V01.patch, HBASE-20188.sh, HBase 2.0 
> performance evaluation - Basic vs None_ system settings.pdf, 
> ITBLL2.5B_1.2.7vs2.0.0_cpu.png, ITBLL2.5B_1.2.7vs2.0.0_gctime.png, 
> ITBLL2.5B_1.2.7vs2.0.0_iops.png, ITBLL2.5B_1.2.7vs2.0.0_load.png, 
> ITBLL2.5B_1.2.7vs2.0.0_memheap.png, ITBLL2.5B_1.2.7vs2.0.0_memstore.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops.png, 
> ITBLL2.5B_1.2.7vs2.0.0_ops_NOT_summing_regions.png, YCSB_CPU.png, 
> YCSB_GC_TIME.png, YCSB_IN_MEMORY_COMPACTION=NONE.ops.png, YCSB_MEMSTORE.png, 
> YCSB_OPs.png, YCSB_in-memory-compaction=NONE.ops.png, YCSB_load.png, 
> flamegraph-1072.1.svg, flamegraph-1072.2.svg, 
> misses.127.workloadc.20180402T200918Z.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor 
> that it is much slower, that the problem is the asyncwal writing. Does 
> in-memory compaction slow us down or speed us up? What happens when you 
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something 
> about perf when 2.0.0 ships.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20193) Basic Replication Web UI - Regionserver

2018-04-03 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424867#comment-16424867
 ] 

Jingyun Tian commented on HBASE-20193:
--

[~stack] Still got the compiling issue. Should I try to modify the code? 

> Basic Replication Web UI - Regionserver 
> 
>
> Key: HBASE-20193
> URL: https://issues.apache.org/jira/browse/HBASE-20193
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication, Usability
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HBASE-20193.master.001.patch, 
> HBASE-20193.master.002.patch, HBASE-20193.master.003.patch, 
> HBASE-20193.master.004.patch, HBASE-20193.master.004.patch
>
>
> subtask of HBASE-15809. Implementation of replication UI on Regionserver web 
> page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20193) Basic Replication Web UI - Regionserver

2018-04-03 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424864#comment-16424864
 ] 

Jingyun Tian commented on HBASE-20193:
--

[~busbey] Thx, I'll try to help

> Basic Replication Web UI - Regionserver 
> 
>
> Key: HBASE-20193
> URL: https://issues.apache.org/jira/browse/HBASE-20193
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication, Usability
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HBASE-20193.master.001.patch, 
> HBASE-20193.master.002.patch, HBASE-20193.master.003.patch, 
> HBASE-20193.master.004.patch, HBASE-20193.master.004.patch
>
>
> subtask of HBASE-15809. Implementation of replication UI on Regionserver web 
> page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20317) Backport HBASE-20261 "Table page (table.jsp) in Master UI does not show replicaIds for hbase meta table" to branch-1

2018-04-03 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424849#comment-16424849
 ] 

Toshihiro Suzuki commented on HBASE-20317:
--

The patch in HBASE-20261 was already pushed to master, branch-2, branch-2.0, 
and branch-1. Do we need to push to other branches such as branch-1.2, 
branch-1.3 and branch-1.4? Thanks.

> Backport HBASE-20261 "Table page (table.jsp) in Master UI does not show 
> replicaIds for hbase meta table" to branch-1
> 
>
> Key: HBASE-20317
> URL: https://issues.apache.org/jira/browse/HBASE-20317
> Project: HBase
>  Issue Type: Sub-task
>  Components: backport
>Reporter: stack
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> Backport parent issue to branch-1. Hope you don't mind my assigning you 
> [~brfrn169]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20243) [Shell] Add shell command to create a new table by cloning the existent table

2018-04-03 Thread Guangxu Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424850#comment-16424850
 ] 

Guangxu Cheng commented on HBASE-20243:
---

{quote} - don't we need shell test too?
 - use a variable NUM_SPLITS=2 and use that in variable initializations and 
asserts
 - Add tests for cases when a) source table doesn't exist, b) destination table 
exists. Basically, we should have tests for both success and failure 
scenarios.{quote}
OK, I will add it.
bq. - Does it compile? I don't see FAMILY_0 and 1 in declarations in 
TestAsyncTableAdminApi.java
FAMILY_0 and 1 have been declared in the parent class TestAsyncAdminBase.java
bq. - Any way we can refactor out the common code in test?
Let me try it.

 Thanks [~appy]

> [Shell] Add shell command to create a new table by cloning the existent table
> -
>
> Key: HBASE-20243
> URL: https://issues.apache.org/jira/browse/HBASE-20243
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: HBASE-20243.master.001.patch, 
> HBASE-20243.master.002.patch, HBASE-20243.master.003.patch, 
> HBASE-20243.master.004.patch, HBASE-20243.master.005.patch
>
>
> In the production environment, we need to create a new table every day. The 
> schema and the split keys of the table are the same as that of yesterday's 
> table, only the name of the table is different. For example, 
> x_20180321,x_20180322 etc.But now there is no convenient command to 
> do this. So we may need such a command(clone_table) to create a new table by 
> cloning the existent table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20337) Update the doc on how to setup shortcircuit reads; its stale

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20337:
--
Status: Patch Available  (was: Open)

> Update the doc on how to setup shortcircuit reads; its stale
> 
>
> Key: HBASE-20337
> URL: https://issues.apache.org/jira/browse/HBASE-20337
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20337.master.001.patch, 
> HBASE-20337.master.002.patch
>
>
> The doc is from another era. Update it. Short-circuit reads can make a big 
> difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20312) CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424844#comment-16424844
 ] 

stack commented on HBASE-20312:
---

I love your numbers [~chancelq].

Here is my graphs of five YCSB runs of load, workloada (50/50) and then 
workloadc (100% reads). Each run is made of three humps (load, a, c). Disregard 
a lone hump left of center. The five runs are hbase-1.2.7, hbase-2 with and 
without in-memory compaction, etc. The run on the far right is with CCSMap. Its 
profile looks like te others. I have graphs of GC and load too and they seem 
the same. I must be doing something wrong. Thanks.

 !hits.png! 

> CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore
> --
>
> Key: HBASE-20312
> URL: https://issues.apache.org/jira/browse/HBASE-20312
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Xiang Wang
>Assignee: Chance Li
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 1.1.2-ccsmap-number.png, HBASE-20312-1.3.2.patch, 
> HBASE-20312-master.v1.patch, HBASE-20312-master.v2.patch, 
> HBASE-20312-master.v3.patch, ccsmap-branch-1.1.patch, hits.png, jira1.png, 
> jira2.png, jira3.png
>
>
> Now hbase use ConcurrentSkipListMap as memstore's data structure.
> Although MemSLAB reduces memory fragment brought by key-value pairs.
> Hundred of millions key-value pairs still make young generation 
> garbage-collection(gc) stop time long.
>  
> These are 2 gc problems of ConcurrentSkipListMap:
> 1. HBase needs 3 objects to store one key-value on expectation. One 
> Index(skiplist's average node height is 1), one Node, and one KeyValue. Too 
> many objects are created for memstore.
> 2. Recent inserted KeyValue and its map structure(Index, Node) are assigned 
> on young generation.The card table (for CMS gc algorithm) or RSet(for G1 gc 
> algorithm) will change frequently on high writing throughput, which makes YGC 
> slow.
>  
> We devleoped a new skip-list map called CompactdConcurrentSkipListMap(CCSMap 
> for short),
> which provides similary features like ConcurrentSkipListMap but get rid of 
> Objects for every key-value pairs.
> CCSMap's memory structure is like this picture:
> !jira1.png!
>  
> One CCSMap consists of a certain number of Chunks. One Chunk consists of a 
> certain number of nodes. One node is corresspding one element. This element's 
> all information and its key-value is encoded on a continuous memory segment 
> without any objects.
> Features:
> 1. all insert,update,delete operations is lock-free on CCSMap.
> 2. Consume less memory, it brings 40% memory saving for 50Byte key-value.
> 3. Faster on small key-value because of better cacheline usage. 20~30% better 
> read/write troughput than ConcurrentSkipListMap for 50Byte key-value.
> CCSMap do not support recyle space when deleting element. But it doesn't 
> matter for hbase because of region flush.
> CCSMap has been running on Alibaba's hbase clusters over 17 months, it cuts 
> down YGC time significantly. here are 2 graph of before and after.
> !jira2.png!
> !jira3.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20312) CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20312:
--
Attachment: hits.png

> CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore
> --
>
> Key: HBASE-20312
> URL: https://issues.apache.org/jira/browse/HBASE-20312
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Xiang Wang
>Assignee: Chance Li
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 1.1.2-ccsmap-number.png, HBASE-20312-1.3.2.patch, 
> HBASE-20312-master.v1.patch, HBASE-20312-master.v2.patch, 
> HBASE-20312-master.v3.patch, ccsmap-branch-1.1.patch, hits.png, jira1.png, 
> jira2.png, jira3.png
>
>
> Now hbase use ConcurrentSkipListMap as memstore's data structure.
> Although MemSLAB reduces memory fragment brought by key-value pairs.
> Hundred of millions key-value pairs still make young generation 
> garbage-collection(gc) stop time long.
>  
> These are 2 gc problems of ConcurrentSkipListMap:
> 1. HBase needs 3 objects to store one key-value on expectation. One 
> Index(skiplist's average node height is 1), one Node, and one KeyValue. Too 
> many objects are created for memstore.
> 2. Recent inserted KeyValue and its map structure(Index, Node) are assigned 
> on young generation.The card table (for CMS gc algorithm) or RSet(for G1 gc 
> algorithm) will change frequently on high writing throughput, which makes YGC 
> slow.
>  
> We devleoped a new skip-list map called CompactdConcurrentSkipListMap(CCSMap 
> for short),
> which provides similary features like ConcurrentSkipListMap but get rid of 
> Objects for every key-value pairs.
> CCSMap's memory structure is like this picture:
> !jira1.png!
>  
> One CCSMap consists of a certain number of Chunks. One Chunk consists of a 
> certain number of nodes. One node is corresspding one element. This element's 
> all information and its key-value is encoded on a continuous memory segment 
> without any objects.
> Features:
> 1. all insert,update,delete operations is lock-free on CCSMap.
> 2. Consume less memory, it brings 40% memory saving for 50Byte key-value.
> 3. Faster on small key-value because of better cacheline usage. 20~30% better 
> read/write troughput than ConcurrentSkipListMap for 50Byte key-value.
> CCSMap do not support recyle space when deleting element. But it doesn't 
> matter for hbase because of region flush.
> CCSMap has been running on Alibaba's hbase clusters over 17 months, it cuts 
> down YGC time significantly. here are 2 graph of before and after.
> !jira2.png!
> !jira3.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20276) [shell] Revert shell REPL change and document

2018-04-03 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424842#comment-16424842
 ] 

Sean Busbey commented on HBASE-20276:
-

yep! updated.

> [shell] Revert shell REPL change and document
> -
>
> Key: HBASE-20276
> URL: https://issues.apache.org/jira/browse/HBASE-20276
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, shell
>Affects Versions: 1.4.0, 2.0.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.4.4, 2.0.0
>
>
> Feedback from [~mdrob] on HBASE-19158:
> {quote}
> Shell:
> HBASE-19770. There was another issue opened where this was identified as a 
> problem so maybe the shape will change further, but I can't find it now.
> {quote}
> New commentary from [~busbey]:
> This was a follow on to HBASE-15965. That change effectively makes it so none 
> of our ruby wrappers can be used to build expressions in an interactive REPL. 
> This is a pretty severe change (most of my tips on HBASE-15611 will break, I 
> think).
> I think we should
> a) Have a DISCUSS thread, spanning dev@ and user@
> b) based on the outcome of that thread, either default to the new behavior or 
> the old behavior
> c) if we keep the HBASE-15965 behavior as  the default, flag it as 
> incompatible, call it out in the hbase 2.0 upgrade section, and update docs 
> (two examples: the output in the shell_exercises sections would be wrong, and 
> the _table_variables section won't work)
> d) In either case document the new flag in the ref guide



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20276) [shell] Revert shell REPL change and document

2018-04-03 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-20276:

Summary: [shell] Revert shell REPL change and document  (was: [shell] 
confirm shell REPL change and document)

> [shell] Revert shell REPL change and document
> -
>
> Key: HBASE-20276
> URL: https://issues.apache.org/jira/browse/HBASE-20276
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, shell
>Affects Versions: 1.4.0, 2.0.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.4.4, 2.0.0
>
>
> Feedback from [~mdrob] on HBASE-19158:
> {quote}
> Shell:
> HBASE-19770. There was another issue opened where this was identified as a 
> problem so maybe the shape will change further, but I can't find it now.
> {quote}
> New commentary from [~busbey]:
> This was a follow on to HBASE-15965. That change effectively makes it so none 
> of our ruby wrappers can be used to build expressions in an interactive REPL. 
> This is a pretty severe change (most of my tips on HBASE-15611 will break, I 
> think).
> I think we should
> a) Have a DISCUSS thread, spanning dev@ and user@
> b) based on the outcome of that thread, either default to the new behavior or 
> the old behavior
> c) if we keep the HBASE-15965 behavior as  the default, flag it as 
> incompatible, call it out in the hbase 2.0 upgrade section, and update docs 
> (two examples: the output in the shell_exercises sections would be wrong, and 
> the _table_variables section won't work)
> d) In either case document the new flag in the ref guide



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-04-03 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424840#comment-16424840
 ] 

Toshihiro Suzuki commented on HBASE-19572:
--

I just attached a rebased patch. Could you please take a look at the patch? 
[~esteban]

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-04-03 Thread Toshihiro Suzuki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiro Suzuki updated HBASE-19572:
-
Attachment: HBASE-19572.master.001.patch

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20312) CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore

2018-04-03 Thread Chance Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424833#comment-16424833
 ] 

Chance Li commented on HBASE-20312:
---

[~stack] sir,  in our 1.1.2 branch, the result is:
 !1.1.2-ccsmap-number.png! 
I will check it in master version again. It's more strange. 

> CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore
> --
>
> Key: HBASE-20312
> URL: https://issues.apache.org/jira/browse/HBASE-20312
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Xiang Wang
>Assignee: Chance Li
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 1.1.2-ccsmap-number.png, HBASE-20312-1.3.2.patch, 
> HBASE-20312-master.v1.patch, HBASE-20312-master.v2.patch, 
> HBASE-20312-master.v3.patch, ccsmap-branch-1.1.patch, jira1.png, jira2.png, 
> jira3.png
>
>
> Now hbase use ConcurrentSkipListMap as memstore's data structure.
> Although MemSLAB reduces memory fragment brought by key-value pairs.
> Hundred of millions key-value pairs still make young generation 
> garbage-collection(gc) stop time long.
>  
> These are 2 gc problems of ConcurrentSkipListMap:
> 1. HBase needs 3 objects to store one key-value on expectation. One 
> Index(skiplist's average node height is 1), one Node, and one KeyValue. Too 
> many objects are created for memstore.
> 2. Recent inserted KeyValue and its map structure(Index, Node) are assigned 
> on young generation.The card table (for CMS gc algorithm) or RSet(for G1 gc 
> algorithm) will change frequently on high writing throughput, which makes YGC 
> slow.
>  
> We devleoped a new skip-list map called CompactdConcurrentSkipListMap(CCSMap 
> for short),
> which provides similary features like ConcurrentSkipListMap but get rid of 
> Objects for every key-value pairs.
> CCSMap's memory structure is like this picture:
> !jira1.png!
>  
> One CCSMap consists of a certain number of Chunks. One Chunk consists of a 
> certain number of nodes. One node is corresspding one element. This element's 
> all information and its key-value is encoded on a continuous memory segment 
> without any objects.
> Features:
> 1. all insert,update,delete operations is lock-free on CCSMap.
> 2. Consume less memory, it brings 40% memory saving for 50Byte key-value.
> 3. Faster on small key-value because of better cacheline usage. 20~30% better 
> read/write troughput than ConcurrentSkipListMap for 50Byte key-value.
> CCSMap do not support recyle space when deleting element. But it doesn't 
> matter for hbase because of region flush.
> CCSMap has been running on Alibaba's hbase clusters over 17 months, it cuts 
> down YGC time significantly. here are 2 graph of before and after.
> !jira2.png!
> !jira3.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20312) CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore

2018-04-03 Thread Chance Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chance Li updated HBASE-20312:
--
Attachment: 1.1.2-ccsmap-number.png

> CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore
> --
>
> Key: HBASE-20312
> URL: https://issues.apache.org/jira/browse/HBASE-20312
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Xiang Wang
>Assignee: Chance Li
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 1.1.2-ccsmap-number.png, HBASE-20312-1.3.2.patch, 
> HBASE-20312-master.v1.patch, HBASE-20312-master.v2.patch, 
> HBASE-20312-master.v3.patch, ccsmap-branch-1.1.patch, jira1.png, jira2.png, 
> jira3.png
>
>
> Now hbase use ConcurrentSkipListMap as memstore's data structure.
> Although MemSLAB reduces memory fragment brought by key-value pairs.
> Hundred of millions key-value pairs still make young generation 
> garbage-collection(gc) stop time long.
>  
> These are 2 gc problems of ConcurrentSkipListMap:
> 1. HBase needs 3 objects to store one key-value on expectation. One 
> Index(skiplist's average node height is 1), one Node, and one KeyValue. Too 
> many objects are created for memstore.
> 2. Recent inserted KeyValue and its map structure(Index, Node) are assigned 
> on young generation.The card table (for CMS gc algorithm) or RSet(for G1 gc 
> algorithm) will change frequently on high writing throughput, which makes YGC 
> slow.
>  
> We devleoped a new skip-list map called CompactdConcurrentSkipListMap(CCSMap 
> for short),
> which provides similary features like ConcurrentSkipListMap but get rid of 
> Objects for every key-value pairs.
> CCSMap's memory structure is like this picture:
> !jira1.png!
>  
> One CCSMap consists of a certain number of Chunks. One Chunk consists of a 
> certain number of nodes. One node is corresspding one element. This element's 
> all information and its key-value is encoded on a continuous memory segment 
> without any objects.
> Features:
> 1. all insert,update,delete operations is lock-free on CCSMap.
> 2. Consume less memory, it brings 40% memory saving for 50Byte key-value.
> 3. Faster on small key-value because of better cacheline usage. 20~30% better 
> read/write troughput than ConcurrentSkipListMap for 50Byte key-value.
> CCSMap do not support recyle space when deleting element. But it doesn't 
> matter for hbase because of region flush.
> CCSMap has been running on Alibaba's hbase clusters over 17 months, it cuts 
> down YGC time significantly. here are 2 graph of before and after.
> !jira2.png!
> !jira3.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20243) [Shell] Add shell command to create a new table by cloning the existent table

2018-04-03 Thread Appy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424832#comment-16424832
 ] 

Appy commented on HBASE-20243:
--

Patch looks good, few comments.
 * don't we need shell test too?
 * use a variable NUM_SPLITS=2 and use that in variable initializations and 
asserts
 * Add tests for cases when a) source table doesn't exist, b) destination table 
exists. Basically, we should have tests for both success and failure scenarios.
 * Does it compile? I don't see FAMILY_0 and 1 in declarations in 
TestAsyncTableAdminApi.java
 * Any way we can refactor out the common code in test?

> [Shell] Add shell command to create a new table by cloning the existent table
> -
>
> Key: HBASE-20243
> URL: https://issues.apache.org/jira/browse/HBASE-20243
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: HBASE-20243.master.001.patch, 
> HBASE-20243.master.002.patch, HBASE-20243.master.003.patch, 
> HBASE-20243.master.004.patch, HBASE-20243.master.005.patch
>
>
> In the production environment, we need to create a new table every day. The 
> schema and the split keys of the table are the same as that of yesterday's 
> table, only the name of the table is different. For example, 
> x_20180321,x_20180322 etc.But now there is no convenient command to 
> do this. So we may need such a command(clone_table) to create a new table by 
> cloning the existent table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore

2018-04-03 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424812#comment-16424812
 ] 

Andrew Purtell commented on HBASE-18309:


If the version in the POM of hbase-error-prone is not the same as the parent 
POM I've seen issues, otherwise no.

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18309.addendum.patch, 
> HBASE-18309.branch-1.001.patch, HBASE-18309.branch-1.002.patch, 
> HBASE-18309.branch-1.003.patch, HBASE-18309.branch-1.004.patch, 
> HBASE-18309.branch-1.005.patch, HBASE-18309.branch-1.006.patch, 
> HBASE-18309.master.001.patch, HBASE-18309.master.002.patch, 
> HBASE-18309.master.004.patch, HBASE-18309.master.005.patch, 
> HBASE-18309.master.006.patch, HBASE-18309.master.007.patch, 
> HBASE-18309.master.008.patch, HBASE-18309.master.009.patch, 
> HBASE-18309.master.010.patch, HBASE-18309.master.011.patch, 
> HBASE-18309.master.012.patch, space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19488) Remove Unused Code from CollectionUtils

2018-04-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424811#comment-16424811
 ] 

Hadoop QA commented on HBASE-19488:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  7m 
 8s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} hbase-common: The patch generated 0 new + 89 
unchanged - 1 fixed = 89 total (was 90) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} The patch hbase-client passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch hbase-replication passed checkstyle 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} hbase-server: The patch generated 0 new + 294 
unchanged - 1 fixed = 294 total (was 295) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
52s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
20m  0s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
0s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-replication in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}100m 
51s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense

[jira] [Commented] (HBASE-20330) ProcedureExecutor.start() gets stuck in recover lease on store.

2018-04-03 Thread Umesh Agashe (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424808#comment-16424808
 ] 

Umesh Agashe commented on HBASE-20330:
--

Thanks for the details, [~appy]. Will upload the patch soon.

> ProcedureExecutor.start() gets stuck in recover lease on store.
> ---
>
> Key: HBASE-20330
> URL: https://issues.apache.org/jira/browse/HBASE-20330
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0
>
>
> We have instance in our internal testing where master log is getting filled 
> with following messages:
> {code}
> 2018-04-02 17:11:17,566 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Recover lease on dfs file 
> hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log
> 2018-04-02 17:11:17,567 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Recovered lease, attempt=0 on 
> file=hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log after 1ms
> 2018-04-02 17:11:17,574 WARN 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Unable to 
> read tracker for hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log 
> - Invalid Trailer version. got 111 expected 1
> 2018-04-02 17:11:17,576 ERROR 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Log file with 
> id=19 already exists
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /hbase/MasterProcWALs/pv2-0019.log for client 10.17.202.11 
> already exists
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:381)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2442)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}
> Debugging it further with [~appy] and [~avirmani], we found that when 
> WALProcedureStore#rollWriter() fails and returns false for some reason, it 
> keeps looping continuously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20312) CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424799#comment-16424799
 ] 

stack commented on HBASE-20312:
---

bq. I guess you didn't turn off CompactingMemsotre (set 
"hbase.hregion.compacting.memstore.type" to NONE) in your testing boss, could 
you check and retry if not? 

Right. Just turned it off and it seems to work now.

What should I see? Perf, Load, and GC seems about the same in these YCSB runs. 
Thanks lads.

> CCSMap: A faster, GC-friendly, less memory Concurrent Map for memstore
> --
>
> Key: HBASE-20312
> URL: https://issues.apache.org/jira/browse/HBASE-20312
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Xiang Wang
>Assignee: Chance Li
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20312-1.3.2.patch, HBASE-20312-master.v1.patch, 
> HBASE-20312-master.v2.patch, HBASE-20312-master.v3.patch, 
> ccsmap-branch-1.1.patch, jira1.png, jira2.png, jira3.png
>
>
> Now hbase use ConcurrentSkipListMap as memstore's data structure.
> Although MemSLAB reduces memory fragment brought by key-value pairs.
> Hundred of millions key-value pairs still make young generation 
> garbage-collection(gc) stop time long.
>  
> These are 2 gc problems of ConcurrentSkipListMap:
> 1. HBase needs 3 objects to store one key-value on expectation. One 
> Index(skiplist's average node height is 1), one Node, and one KeyValue. Too 
> many objects are created for memstore.
> 2. Recent inserted KeyValue and its map structure(Index, Node) are assigned 
> on young generation.The card table (for CMS gc algorithm) or RSet(for G1 gc 
> algorithm) will change frequently on high writing throughput, which makes YGC 
> slow.
>  
> We devleoped a new skip-list map called CompactdConcurrentSkipListMap(CCSMap 
> for short),
> which provides similary features like ConcurrentSkipListMap but get rid of 
> Objects for every key-value pairs.
> CCSMap's memory structure is like this picture:
> !jira1.png!
>  
> One CCSMap consists of a certain number of Chunks. One Chunk consists of a 
> certain number of nodes. One node is corresspding one element. This element's 
> all information and its key-value is encoded on a continuous memory segment 
> without any objects.
> Features:
> 1. all insert,update,delete operations is lock-free on CCSMap.
> 2. Consume less memory, it brings 40% memory saving for 50Byte key-value.
> 3. Faster on small key-value because of better cacheline usage. 20~30% better 
> read/write troughput than ConcurrentSkipListMap for 50Byte key-value.
> CCSMap do not support recyle space when deleting element. But it doesn't 
> matter for hbase because of region flush.
> CCSMap has been running on Alibaba's hbase clusters over 17 months, it cuts 
> down YGC time significantly. here are 2 graph of before and after.
> !jira2.png!
> !jira3.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424790#comment-16424790
 ] 

Hadoop QA commented on HBASE-20159:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 9s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  0m 
36s{color} | {color:red} branch has 7 errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  0m 
11s{color} | {color:red} patch has 7 errors when building our shaded downstream 
artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m  3s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
25s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:dba4808 |
| JIRA Issue | HBASE-20159 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917462/HBASE-20159.branch-2.addendum.v2.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c20079c9aed2 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 
12:16:42 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2 / 40fbecd97c |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12290/testReport/ |
| Max. process+thread count | 364 (vs. ulimit of 1) |
| modules | C: hbase-common U: hbase-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12290

[jira] [Updated] (HBASE-20298) Doc change in read/write/total accounting metrics

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20298:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+  Thanks for reviews.

> Doc change in read/write/total accounting metrics
> -
>
> Key: HBASE-20298
> URL: https://issues.apache.org/jira/browse/HBASE-20298
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20298.master.001.patch, 
> HBASE-20298.master.002.patch, HBASE-20298.master.003.patch, 
> HBASE-20298.master.004.patch
>
>
> Doc the change wrought by the parent issue. Get it up into the refguide as 
> part of the difference between old hbases and hbase2.
> The change confused me and took me a while to untangle.
> The read count is for reads that return a non-empty result now. In old 
> hbase1, we'd increment the read-count even if an empty result. This makes 
> reads look bad in YCSB runs when compared to hbase1 (see how 
> totalRequestCount in hbase2 can be way above the sum of reads+writes; it is 
> because it increments even if the row is not found).
> Let me get this into refguide otherwise poor old operators will be baffled. 
> The release note on the parent is great; it just needs to be in our guide.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-17730) [DOC] Migration to 2.0 for coprocessors

2018-04-03 Thread Appy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424764#comment-16424764
 ] 

Appy commented on HBASE-17730:
--

Uploaded patch v1 which adds section "3.1.2. Upgrading Coprocessors to 2.0" to 
our book.

Ping [~busbey], [~mdrob] for review.

> [DOC] Migration to 2.0 for coprocessors 
> 
>
> Key: HBASE-17730
> URL: https://issues.apache.org/jira/browse/HBASE-17730
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, migration
>Reporter: Appy
>Assignee: Appy
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17730.master.001.patch
>
>
> Jiras breaking coprocessor compatibility should be marked with component ' 
> Coprocessor', and label 'incompatible'.
> Close to releasing 2.0, we should go through all such jiras and write down 
> steps for migrating coprocessor easily.
> The idea is, it might be very hard to fix coprocessor breakages by reverse 
> engineering errors,  but will be easier we suggest easiest way to fix 
> breakages resulting from each individual incompatible change.
> For eg. HBASE-17312 is incompatible change. It'll result in 100s of errors 
> because BaseXXXObserver classes are gone and will probably result in a lot of 
> confusion, but if we explicitly mention the fix which is just one line change 
> - replace "Foo extends BaseXXXObserver" with "Foo implements XXXObserver" - 
> it makes it very easy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-17730) [DOC] Migration to 2.0 for coprocessors

2018-04-03 Thread Appy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-17730:
-
Attachment: HBASE-17730.master.001.patch

> [DOC] Migration to 2.0 for coprocessors 
> 
>
> Key: HBASE-17730
> URL: https://issues.apache.org/jira/browse/HBASE-17730
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, migration
>Reporter: Appy
>Assignee: Appy
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17730.master.001.patch
>
>
> Jiras breaking coprocessor compatibility should be marked with component ' 
> Coprocessor', and label 'incompatible'.
> Close to releasing 2.0, we should go through all such jiras and write down 
> steps for migrating coprocessor easily.
> The idea is, it might be very hard to fix coprocessor breakages by reverse 
> engineering errors,  but will be easier we suggest easiest way to fix 
> breakages resulting from each individual incompatible change.
> For eg. HBASE-17312 is incompatible change. It'll result in 100s of errors 
> because BaseXXXObserver classes are gone and will probably result in a lot of 
> confusion, but if we explicitly mention the fix which is just one line change 
> - replace "Foo extends BaseXXXObserver" with "Foo implements XXXObserver" - 
> it makes it very easy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (HBASE-17730) [DOC] Migration to 2.0 for coprocessors

2018-04-03 Thread Appy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-17730 started by Appy.

> [DOC] Migration to 2.0 for coprocessors 
> 
>
> Key: HBASE-17730
> URL: https://issues.apache.org/jira/browse/HBASE-17730
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, migration
>Reporter: Appy
>Assignee: Appy
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-17730.master.001.patch
>
>
> Jiras breaking coprocessor compatibility should be marked with component ' 
> Coprocessor', and label 'incompatible'.
> Close to releasing 2.0, we should go through all such jiras and write down 
> steps for migrating coprocessor easily.
> The idea is, it might be very hard to fix coprocessor breakages by reverse 
> engineering errors,  but will be easier we suggest easiest way to fix 
> breakages resulting from each individual incompatible change.
> For eg. HBASE-17312 is incompatible change. It'll result in 100s of errors 
> because BaseXXXObserver classes are gone and will probably result in a lot of 
> confusion, but if we explicitly mention the fix which is just one line change 
> - replace "Foo extends BaseXXXObserver" with "Foo implements XXXObserver" - 
> it makes it very easy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-20159:
--
Attachment: HBASE-20159.branch-2.addendum.v2.patch

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.addendum.v2.patch, HBASE-20159.branch-2.patch, 
> HBASE-20159.patch, HBASE-20159.v2.patch, HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424757#comment-16424757
 ] 

Mike Drob commented on HBASE-20159:
---

v2 addendum: fix javadoc

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.addendum.v2.patch, HBASE-20159.branch-2.patch, 
> HBASE-20159.patch, HBASE-20159.v2.patch, HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-15227) HBase Backup Phase 3: Fault tolerance (client/server) support

2018-04-03 Thread Vladimir Rodionov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-15227.
---
Resolution: Fixed

Done.

> HBase Backup Phase 3: Fault tolerance (client/server) support
> -
>
> Key: HBASE-15227
> URL: https://issues.apache.org/jira/browse/HBASE-15227
> Project: HBase
>  Issue Type: Task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
>  Labels: backup
> Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch
>
>
> System must be tolerant to faults: 
> # Backup operations MUST be atomic (no partial completion state in the backup 
> system table)
> # Process must detect any type of failures which can result in a data loss 
> (partial backup or partial restore) 
> # Proper system table state restore and cleanup must be done in case of a 
> failure
> # Additional utility to repair backup system table and corresponding file 
> system cleanup must be implemented
> h3. Backup
> h4. General FT framework implementation 
> Before actual backup operation starts, snapshot of a backup system table is 
> taken and system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will 
> be removed upon backup completion. 
> In case of *any* server-side failures, client catches errors/exceptions and 
> handles them:
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full 
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup 
> system table before)
> In case of *any* client-side failures:
> Before any backup or restore operation run we check backup system table on 
> *ACTIVE_SNAPSHOT*, if flag is present, operation aborts with a message that 
> backup repair tool (see below) must be run
> h4. Backup repair tool
> The command line tool *backup repair* which executes the following steps:
> # Reads info of a last failed backup session
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full 
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup 
> system table before)
> h4. Detection of a partial loss of data
> h5. Full backup  
> Export snapshot operation (?).
> We count files and check sizes before and after DistCp run
> h5. Incremental backup 
> Conversion of WAL to HFiles, when WAL file is moved from active to archive 
> directory. The code is in place to handle this situation
> During DistCp run (same as above)
> h3. Restore
> This operation does not modify backup system table and is idempotent. No 
> special FT is required.   
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19994) Create a new class for RPC throttling exception, make it retryable.

2018-04-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424732#comment-16424732
 ] 

Hadoop QA commented on HBASE-19994:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
36s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} The patch hbase-client passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} hbase-server: The patch generated 0 new + 31 
unchanged - 6 fixed = 31 total (was 37) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
21s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
17m  9s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
4s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}134m 
13s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-19994 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917434/HBASE-19994-master-v07.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux f02f173e0923 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 
12:16:42 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreComm

[jira] [Commented] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424722#comment-16424722
 ] 

Hadoop QA commented on HBASE-20159:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
32s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 0s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
35s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
13m 30s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
16s{color} | {color:red} hbase-common generated 4 new + 7 unchanged - 0 fixed = 
11 total (was 7) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:dba4808 |
| JIRA Issue | HBASE-20159 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917453/HBASE-20159.branch-2.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 60933860df0d 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2 / 40924bb4af |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
| javadoc | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12289/artifact/patchprocess/diff-javadoc-javadoc-hbase-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12289/testRepor

[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore

2018-04-03 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424719#comment-16424719
 ] 

Mike Drob commented on HBASE-18309:
---

The error prone libs aren't getting built during the mvn install phase because 
the profile isn't enabled... I just deployed the snapshot manually which will 
be a good workaround for the next month until it ages out of the sonatype repo.

[~apurtell] - do you know why the branch-1 error prone doesn't always get 
built? The module is included via a profile, while on branch-2 root always 
includes build-support always includes error-prone.

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18309.addendum.patch, 
> HBASE-18309.branch-1.001.patch, HBASE-18309.branch-1.002.patch, 
> HBASE-18309.branch-1.003.patch, HBASE-18309.branch-1.004.patch, 
> HBASE-18309.branch-1.005.patch, HBASE-18309.branch-1.006.patch, 
> HBASE-18309.master.001.patch, HBASE-18309.master.002.patch, 
> HBASE-18309.master.004.patch, HBASE-18309.master.005.patch, 
> HBASE-18309.master.006.patch, HBASE-18309.master.007.patch, 
> HBASE-18309.master.008.patch, HBASE-18309.master.009.patch, 
> HBASE-18309.master.010.patch, HBASE-18309.master.011.patch, 
> HBASE-18309.master.012.patch, space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-17730) [DOC] Migration to 2.0 for coprocessors

2018-04-03 Thread Appy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424710#comment-16424710
 ] 

Appy commented on HBASE-17730:
--

Uploaded ascii doc of [Coprocessor design 
improvements|https://docs.google.com/document/d/1mPkM1CRRvBMZL4dBQzrus8obyvNnHhR5it2yyhiFXTg/edit#heading=h.cu9hamx6jk24]
 to dev-support/design-docs 
[here|https://github.com/apache/hbase/blob/master/dev-support/design-docs/Coprocessor_Design_Improvements-Use_composition_instead_of_inheritance-HBASE-17732.adoc].

> [DOC] Migration to 2.0 for coprocessors 
> 
>
> Key: HBASE-17730
> URL: https://issues.apache.org/jira/browse/HBASE-17730
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, migration
>Reporter: Appy
>Assignee: Appy
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Jiras breaking coprocessor compatibility should be marked with component ' 
> Coprocessor', and label 'incompatible'.
> Close to releasing 2.0, we should go through all such jiras and write down 
> steps for migrating coprocessor easily.
> The idea is, it might be very hard to fix coprocessor breakages by reverse 
> engineering errors,  but will be easier we suggest easiest way to fix 
> breakages resulting from each individual incompatible change.
> For eg. HBASE-17312 is incompatible change. It'll result in 100s of errors 
> because BaseXXXObserver classes are gone and will probably result in a lot of 
> confusion, but if we explicitly mention the fix which is just one line change 
> - replace "Foo extends BaseXXXObserver" with "Foo implements XXXObserver" - 
> it makes it very easy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20330) ProcedureExecutor.start() gets stuck in recover lease on store.

2018-04-03 Thread Appy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424703#comment-16424703
 ] 

Appy commented on HBASE-20330:
--

we can probably fix this easily by refreshing the list of log files in start of 
each loopneed to be sure.

> ProcedureExecutor.start() gets stuck in recover lease on store.
> ---
>
> Key: HBASE-20330
> URL: https://issues.apache.org/jira/browse/HBASE-20330
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0
>
>
> We have instance in our internal testing where master log is getting filled 
> with following messages:
> {code}
> 2018-04-02 17:11:17,566 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Recover lease on dfs file 
> hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log
> 2018-04-02 17:11:17,567 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Recovered lease, attempt=0 on 
> file=hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log after 1ms
> 2018-04-02 17:11:17,574 WARN 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Unable to 
> read tracker for hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log 
> - Invalid Trailer version. got 111 expected 1
> 2018-04-02 17:11:17,576 ERROR 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Log file with 
> id=19 already exists
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /hbase/MasterProcWALs/pv2-0019.log for client 10.17.202.11 
> already exists
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:381)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2442)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}
> Debugging it further with [~appy] and [~avirmani], we found that when 
> WALProcedureStore#rollWriter() fails and returns false for some reason, it 
> keeps looping continuously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20259) Doc configs for in-memory-compaction and add detail to in-memory-compaction logging

2018-04-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424702#comment-16424702
 ] 

Hudson commented on HBASE-20259:


Results for branch branch-2.0
[build #126 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/126/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/126//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/126//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/126//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Doc configs for in-memory-compaction and add detail to in-memory-compaction 
> logging
> ---
>
> Key: HBASE-20259
> URL: https://issues.apache.org/jira/browse/HBASE-20259
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20259.branch-2.0.ADDENDUM.patch, 
> HBASE-20259.master.001.patch, HBASE-20259.master.002.patch, 
> HBASE-20259.master.003.patch
>
>
> I set {{hbase.systemtables.compacting.memstore.type}} to NONE but it seems 
> like in-memory is still on. My table looks like this:
> {code}
> Table ycsb is ENABLED
> ycsb
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'family', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', 
> NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', 
> CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER =
> > 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', 
> > CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', 
> > COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
> {code}
> Looks like table doesn't have it on either (IN_MEMORY_COMPACTION doesn't show 
> in the above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-20159:
--
Release Note: 
After HBASE-20159 we allow client to use different ZK quorums by introducing 
three new properties: hbase.client.zookeeper.quorum and 
hbase.client.zookeeper.property.clientPort to specify client zookeeper 
properties (note that the combination of these two properties should be 
different from the server ZK quorums), and hbase.client.zookeeper.observer.mode 
to indicate whether the client ZK nodes are in observer mode (false by default)

HConstants.DEFAULT_ZOOKEPER_CLIENT_PORT has been removed in HBase 3.0 and 
replaced by the correctly spelled DEFAULT_ZOOKEEPER_CLIENT_PORT.


  was:After HBASE-20159 we allow client to use different ZK quorums by 
introducing three new properties: hbase.client.zookeeper.quorum and 
hbase.client.zookeeper.property.clientPort to specify client zookeeper 
properties (note that the combination of these two properties should be 
different from the server ZK quorums), and hbase.client.zookeeper.observer.mode 
to indicate whether the client ZK nodes are in observer mode (false by default)


> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.patch, HBASE-20159.patch, HBASE-20159.v2.patch, 
> HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-20159:
--
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.patch, HBASE-20159.patch, HBASE-20159.v2.patch, 
> HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob reopened HBASE-20159:
---

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.patch, HBASE-20159.patch, HBASE-20159.v2.patch, 
> HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-20159:
--
Attachment: HBASE-20159.branch-2.patch

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.patch, HBASE-20159.patch, HBASE-20159.v2.patch, 
> HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-20159:
--
Status: Patch Available  (was: Reopened)

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.patch, HBASE-20159.patch, HBASE-20159.v2.patch, 
> HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20159) Support using separate ZK quorums for client

2018-04-03 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424700#comment-16424700
 ] 

Mike Drob commented on HBASE-20159:
---

[~carp84] - I would like to request another addendum for branch-2 here. This 
patch renamed {{HConstants.DEFAULT_ZOOKEPER_CLIENT_PORT}} to use the correct 
spelling, but this is annotated @InterfaceAudience.Public, so probably needs a 
deprecation cycle.

> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.branch-2.patch, HBASE-20159.patch, HBASE-20159.v2.patch, 
> HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-12075) Preemptive Fast Fail

2018-04-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424697#comment-16424697
 ] 

stack commented on HBASE-12075:
---

This was committed without a JIRA ID:

commit ece933fa3e272531ee443265c7aef7326e89e7cd
Author: manukranthk 
Date:   Tue Sep 23 19:15:09 2014 -0700

Implement Preemptive Fast Fail

Summary: This diff ports the Preemptive Fast Fail feature to OSS. In multi 
threaded clients, we use a feature developed on 0.89-fb branch called 
Preemptive Fast Fail. This allows the client threads which would potentially 
fail, fail fast. The idea behind this feature is that we allow, among the 
hundreds of client threads, one thread to try and establish connection with the 
regionserver and if that succeeds, we mark it as a live node again. Meanwhile, 
other threads which are trying to establish connection to the same server would 
ideally go into the timeouts which is effectively unfruitful. We can in those 
cases return appropriate exceptions to those clients instead of letting them 
retry.

Test Plan: Unit tests

Differential Revision: https://reviews.facebook.net/D24177

Signed-off-by: stack 

> Preemptive Fast Fail
> 
>
> Key: HBASE-12075
> URL: https://issues.apache.org/jira/browse/HBASE-12075
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 0.99.0, 0.98.6.1, 2.0.0
>Reporter: Manukranth Kolloju
>Assignee: Manukranth Kolloju
>Priority: Major
> Fix For: 0.99.2, 2.0.0
>
> Attachments: 0001-Add-a-test-case-for-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-HBASE-12075-Implement-Preemptive-Fast-Fail.patch, 
> 0001-Implement-Preemptive-Fast-Fail.patch, 
> 0001-Implement-Preemptive-Fast-Fail.patch, 
> 0001-Implement-Preemptive-Fast-Fail.patch, 
> 0001-Implement-Preemptive-Fast-Fail.patch, 
> 0001-Implement-Preemptive-Fast-Fail.patch, 
> HBASE-12075-Preemptive-Fast-Fail-V15.patch
>
>
> In multi threaded clients, we use a feature developed on 0.89-fb branch 
> called Preemptive Fast Fail. This allows the client threads which would 
> potentially fail, fail fast. The idea behind this feature is that we allow, 
> among the hundreds of client threads, one thread to try and establish 
> connection with the regionserver and if that succeeds, we mark it as a live 
> node again. Meanwhile, other threads which are trying to establish connection 
> to the same server would ideally go into the timeouts which is effectively 
> unfruitful. We can in those cases return appropriate exceptions to those 
> clients instead of letting them retry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20330) ProcedureExecutor.start() gets stuck in recover lease on store.

2018-04-03 Thread Appy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424695#comment-16424695
 ] 

Appy commented on HBASE-20330:
--

Hard to reproduce problem.
 Our hypothesis was, if a master during startup tries to create new proc-wal, 
and fails while writing header (for 19.log in this case) 
[here|https://github.com/apache/hbase/blob/d9e64aa6b83fb6ed5230b0fde86fdf8d8732e1a4/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L1041],
 then it'll get stuck in infinite loop.
 Exact seq of steps.
 # rollWriter fails to write header (for 19.log), returns false.
 # we 'continue' the loop 
[here|https://github.com/apache/hbase/blob/d9e64aa6b83fb6ed5230b0fde86fdf8d8732e1a4/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L377]
 # isRunning() will always be true, list of oldLogs is not refreshed (it's 
still till 18.log)
 # renew leases on all existing log files till 18, try to create 19
 # since it was already created last time, rollWriter returns false and we 
'continue' from step 2 again.

> ProcedureExecutor.start() gets stuck in recover lease on store.
> ---
>
> Key: HBASE-20330
> URL: https://issues.apache.org/jira/browse/HBASE-20330
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0
>
>
> We have instance in our internal testing where master log is getting filled 
> with following messages:
> {code}
> 2018-04-02 17:11:17,566 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Recover lease on dfs file 
> hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log
> 2018-04-02 17:11:17,567 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Recovered lease, attempt=0 on 
> file=hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log after 1ms
> 2018-04-02 17:11:17,574 WARN 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Unable to 
> read tracker for hdfs://ns1/hbase/MasterProcWALs/pv2-0018.log 
> - Invalid Trailer version. got 111 expected 1
> 2018-04-02 17:11:17,576 ERROR 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Log file with 
> id=19 already exists
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /hbase/MasterProcWALs/pv2-0019.log for client 10.17.202.11 
> already exists
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:381)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2442)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}
> Debugging it further with [~appy] and [~avirmani], we found that when 
> WALProcedureStore#rollWriter() fails and returns false for some reason, it 
> keeps looping continuously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore

2018-04-03 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424694#comment-16424694
 ] 

Zach York commented on HBASE-18309:
---

I'm not very familiar with the jenkins structure yet, but it looks like it is 
unable to pull this jar from the nexus repo.

 

[~stack] [~busbey] or anyone else have a clearer idea of why it isn't finding 
the jar there?

The error is below:

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) 
on project hbase-server: Execution default-compile of goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile failed: Plugin 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1 or one of its dependencies 
could not be resolved: Could not find artifact 
org.apache.hbase:hbase-error-prone:jar:1.5.0-SNAPSHOT in Nexus 
(http://repository.apache.org/snapshots) -> [Help 1]

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: Reid Chan
>Priority: Major
> Fix For: 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18309.addendum.patch, 
> HBASE-18309.branch-1.001.patch, HBASE-18309.branch-1.002.patch, 
> HBASE-18309.branch-1.003.patch, HBASE-18309.branch-1.004.patch, 
> HBASE-18309.branch-1.005.patch, HBASE-18309.branch-1.006.patch, 
> HBASE-18309.master.001.patch, HBASE-18309.master.002.patch, 
> HBASE-18309.master.004.patch, HBASE-18309.master.005.patch, 
> HBASE-18309.master.006.patch, HBASE-18309.master.007.patch, 
> HBASE-18309.master.008.patch, HBASE-18309.master.009.patch, 
> HBASE-18309.master.010.patch, HBASE-18309.master.011.patch, 
> HBASE-18309.master.012.patch, space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-10933) hbck -fixHdfsOrphans is not working properly it throws null pointer exception

2018-04-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-10933:
--
Fix Version/s: (was: 2.0.0)

> hbck -fixHdfsOrphans is not working properly it throws null pointer exception
> -
>
> Key: HBASE-10933
> URL: https://issues.apache.org/jira/browse/HBASE-10933
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Affects Versions: 0.94.16, 0.98.2
>Reporter: Deepak Sharma
>Assignee: Kashif
>Priority: Critical
> Attachments: HBASE-10933-0.94-v1.patch, HBASE-10933-0.94-v2.patch, 
> HBASE-10933-trunk-v1.patch, HBASE-10933-trunk-v2.patch, TestResults-0.94.txt, 
> TestResults-trunk.txt
>
>
> if we regioninfo file is not existing in hbase region then if we run hbck 
> repair or hbck -fixHdfsOrphans
> then it is not able to resolve this problem it throws null pointer exception
> {code}
> 2014-04-08 20:11:49,750 INFO  [main] util.HBaseFsck 
> (HBaseFsck.java:adoptHdfsOrphans(470)) - Attempting to handle orphan hdfs 
> dir: 
> hdfs://10.18.40.28:54310/hbase/TestHdfsOrphans1/5a3de9ca65e587cb05c9384a3981c950
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$000(HBaseFsck.java:1939)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:497)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:471)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:591)
>   at 
> org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:369)
>   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:447)
>   at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3769)
>   at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3587)
>   at 
> com.huawei.isap.test.smartump.hadoop.hbase.HbaseHbckRepair.repairToFixHdfsOrphans(HbaseHbckRepair.java:244)
>   at 
> com.huawei.isap.test.smartump.hadoop.hbase.HbaseHbckRepair.setUp(HbaseHbckRepair.java:84)
>   at junit.framework.TestCase.runBare(TestCase.java:132)
>   at junit.framework.TestResult$1.protect(TestResult.java:110)
>   at junit.framework.TestResult.runProtected(TestResult.java:128)
>   at junit.framework.TestResult.run(TestResult.java:113)
>   at junit.framework.TestCase.run(TestCase.java:124)
>   at junit.framework.TestSuite.runTest(TestSuite.java:243)
>   at junit.framework.TestSuite.run(TestSuite.java:238)
>   at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> {code}
> problem i got it is because since in HbaseFsck class 
> {code}
>  private void adoptHdfsOrphan(HbckInfo hi)
> {code}
> we are intializing tableinfo using SortedMap tablesInfo 
> object
> {code}
> TableInfo tableInfo = tablesInfo.get(tableName);
> {code}
> but  in private SortedMap loadHdfsRegionInfos()
> {code}
>  for (HbckInfo hbi: hbckInfos) {
>   if (hbi.getHdfsHRI() == null) {
> // was an orphan
> continue;
>   }
> {code}
> we have check if a region is orphan then that table will can not be added in 
> SortedMap tablesInfo
> so later while using this we get null pointer exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 >

1 - 100 of 269 matches

Mail list logo