[jira] [Updated] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace by only writing namespace

2017-11-22 Thread xinxin fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinxin fan updated HBASE-19336:
---
Summary: Improve rsgroup to allow assign all tables within a specified 
namespace by only writing namespace  (was: Improve rsgroup to allow assign all 
tables within a specified namespace from one group to another )

> Improve rsgroup to allow assign all tables within a specified namespace by 
> only writing namespace
> -
>
> Key: HBASE-19336
> URL: https://issues.apache.org/jira/browse/HBASE-19336
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 2.0.0-alpha-4
>Reporter: xinxin fan
>Assignee: xinxin fan
> Attachments: HBASE-19336-master.patch
>
>
> Currently, use can only assign tables within a namespace from one group to 
> another by writing all table names in move_tables_rsgroup command. Allowing 
> to assign all tables within a specifed namespace by only wirting namespace 
> name is useful.
> Usage as follows:
> {code:java}
> hbase(main):055:0> move_tables_rsgroup 'default',['@ns1']
> Took 2.2211 seconds
> {code}
> {code:java}
> hbase(main):051:0* move_servers_tables_rsgroup 
> 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3']
> Took 15.3710 seconds 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace by only writing namespace

2017-11-22 Thread xinxin fan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263913#comment-16263913
 ] 

xinxin fan commented on HBASE-19336:


Here is the patch

> Improve rsgroup to allow assign all tables within a specified namespace by 
> only writing namespace
> -
>
> Key: HBASE-19336
> URL: https://issues.apache.org/jira/browse/HBASE-19336
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 2.0.0-alpha-4
>Reporter: xinxin fan
>Assignee: xinxin fan
> Attachments: HBASE-19336-master.patch
>
>
> Currently, use can only assign tables within a namespace from one group to 
> another by writing all table names in move_tables_rsgroup command. Allowing 
> to assign all tables within a specifed namespace by only wirting namespace 
> name is useful.
> Usage as follows:
> {code:java}
> hbase(main):055:0> move_tables_rsgroup 'default',['@ns1']
> Took 2.2211 seconds
> {code}
> {code:java}
> hbase(main):051:0* move_servers_tables_rsgroup 
> 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3']
> Took 15.3710 seconds 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another

2017-11-22 Thread xinxin fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-19336 started by xinxin fan.
--
> Improve rsgroup to allow assign all tables within a specified namespace from 
> one group to another 
> --
>
> Key: HBASE-19336
> URL: https://issues.apache.org/jira/browse/HBASE-19336
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 2.0.0-alpha-4
>Reporter: xinxin fan
>Assignee: xinxin fan
> Attachments: HBASE-19336-master.patch
>
>
> Currently, use can only assign tables within a namespace from one group to 
> another by writing all table names in move_tables_rsgroup command. Allowing 
> to assign all tables within a specifed namespace by only wirting namespace 
> name is useful.
> Usage as follows:
> {code:java}
> hbase(main):055:0> move_tables_rsgroup 'default',['@ns1']
> Took 2.2211 seconds
> {code}
> {code:java}
> hbase(main):051:0* move_servers_tables_rsgroup 
> 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3']
> Took 15.3710 seconds 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another

2017-11-22 Thread xinxin fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinxin fan updated HBASE-19336:
---
Attachment: HBASE-19336-master.patch

> Improve rsgroup to allow assign all tables within a specified namespace from 
> one group to another 
> --
>
> Key: HBASE-19336
> URL: https://issues.apache.org/jira/browse/HBASE-19336
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 2.0.0-alpha-4
>Reporter: xinxin fan
>Assignee: xinxin fan
> Attachments: HBASE-19336-master.patch
>
>
> Currently, use can only assign tables within a namespace from one group to 
> another by writing all table names in move_tables_rsgroup command. Allowing 
> to assign all tables within a specifed namespace by only wirting namespace 
> name is useful.
> Usage as follows:
> {code:java}
> hbase(main):055:0> move_tables_rsgroup 'default',['@ns1']
> Took 2.2211 seconds
> {code}
> {code:java}
> hbase(main):051:0* move_servers_tables_rsgroup 
> 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3']
> Took 15.3710 seconds 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another

2017-11-22 Thread xinxin fan (JIRA)
xinxin fan created HBASE-19336:
--

 Summary: Improve rsgroup to allow assign all tables within a 
specified namespace from one group to another 
 Key: HBASE-19336
 URL: https://issues.apache.org/jira/browse/HBASE-19336
 Project: HBase
  Issue Type: Improvement
  Components: rsgroup
Affects Versions: 2.0.0-alpha-4
Reporter: xinxin fan


Currently, use can only assign tables within a namespace from one group to 
another by writing all table names in move_tables_rsgroup command. Allowing to 
assign all tables within a specifed namespace by only wirting namespace name is 
useful.

Usage as follows:


{code:java}
hbase(main):055:0> move_tables_rsgroup 'default',['@ns1']
Took 2.2211 seconds
{code}


{code:java}
hbase(main):051:0* move_servers_tables_rsgroup 
'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3']
Took 15.3710 seconds 
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another

2017-11-22 Thread xinxin fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xinxin fan reassigned HBASE-19336:
--

Assignee: xinxin fan

> Improve rsgroup to allow assign all tables within a specified namespace from 
> one group to another 
> --
>
> Key: HBASE-19336
> URL: https://issues.apache.org/jira/browse/HBASE-19336
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 2.0.0-alpha-4
>Reporter: xinxin fan
>Assignee: xinxin fan
>
> Currently, use can only assign tables within a namespace from one group to 
> another by writing all table names in move_tables_rsgroup command. Allowing 
> to assign all tables within a specifed namespace by only wirting namespace 
> name is useful.
> Usage as follows:
> {code:java}
> hbase(main):055:0> move_tables_rsgroup 'default',['@ns1']
> Took 2.2211 seconds
> {code}
> {code:java}
> hbase(main):051:0* move_servers_tables_rsgroup 
> 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3']
> Took 15.3710 seconds 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer

2017-11-22 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16868:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks all for reviewing. Pushed to master and branch-2.

> Add a replicate_all flag to avoid misuse the namespaces and table-cfs config 
> of replication peer
> 
>
> Key: HBASE-16868
> URL: https://issues.apache.org/jira/browse/HBASE-16868
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-16868.master.001.patch, 
> HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, 
> HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, 
> HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, 
> HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, 
> HBASE-16868.master.010.patch, HBASE-16868.master.011.patch
>
>
> First add a new peer by shell cmd.
> {code}
> add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase".
> {code}
> If we don't set namespaces and table cfs in peer config. It means replicate 
> all tables to the peer cluster.
> Then append a table to the peer config.
> {code}
> append_peer_tableCFs '1', {"table1" => []}
> {code}
> Then this peer will only replicate table1 to the peer cluster. It changes to 
> replicate only one table from replicate all tables in the cluster. It is very 
> easy to misuse in production cluster. So we should avoid appending table to a 
> peer which replicates all table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-15320) HBase connector for Kafka Connect

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263901#comment-16263901
 ] 

Ted Yu commented on HBASE-15320:


Thanks for the hard work, Mike.

Please wait for more reviews from other committers.

> HBase connector for Kafka Connect
> -
>
> Key: HBASE-15320
> URL: https://issues.apache.org/jira/browse/HBASE-15320
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Andrew Purtell
>Assignee: Mike Wingert
>  Labels: beginner
> Fix For: 3.0.0
>
> Attachments: HBASE-15320.master.1.patch, HBASE-15320.master.2.patch, 
> HBASE-15320.master.3.patch, HBASE-15320.master.4.patch, 
> HBASE-15320.master.5.patch, HBASE-15320.master.6.patch, 
> HBASE-15320.master.7.patch, HBASE-15320.master.8.patch, 
> HBASE-15320.master.8.patch, HBASE-15320.master.9.patch, HBASE-15320.pdf, 
> HBASE-15320.pdf
>
>
> Implement an HBase connector with source and sink tasks for the Connect 
> framework (http://docs.confluent.io/2.0.0/connect/index.html) available in 
> Kafka 0.9 and later.
> See also: 
> http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines
> An HBase source 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#task-example-source-task)
>  could be implemented as a replication endpoint or WALObserver, publishing 
> cluster wide change streams from the WAL to one or more topics, with 
> configurable mapping and partitioning of table changes to topics.  
> An HBase sink task 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#sink-tasks) would 
> persist, with optional transformation (JSON? Avro?, map fields to native 
> schema?), Kafka SinkRecords into HBase tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19266) TestAcidGuarantees should cover adaptive in-memory compaction

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263888#comment-16263888
 ] 

Ted Yu commented on HBASE-19266:


w.r.t. 'memstoreSize to a negative value' error, the first occurrence is in 
TestAcidGuarantees#testMixedAtomicity

However, if I run the subtest alone, it passes with EAGER policy.

> TestAcidGuarantees should cover adaptive in-memory compaction
> -
>
> Key: HBASE-19266
> URL: https://issues.apache.org/jira/browse/HBASE-19266
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Attachments: HBASE-19266.v0.patch
>
>
> Currently TestAcidGuarantees populates 3 policies of (in-memory) compaction.
> Adaptive in-memory compaction is new and should be added as 4th compaction 
> policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19325) Pass a list of server name to postClearDeadServers

2017-11-22 Thread Guangxu Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangxu Cheng updated HBASE-19325:
--
Attachment: HBASE-19325.branch-1.001.patch

The failed ut not related.Retry again

> Pass a list of server name to postClearDeadServers
> --
>
> Key: HBASE-19325
> URL: https://issues.apache.org/jira/browse/HBASE-19325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
> Attachments: HBASE-19325.branch-1.001.patch, 
> HBASE-19325.branch-1.001.patch, HBASE-19325.branch-2.001.patch
>
>
> Over on the tail of HBASE-18131. [~chia7712] said 
> {quote}
> (Revisiting the AccessController remind me of this issue) 
> Could we remove the duplicate code on the server side? Why not pass a list of 
> server name to postClearDeadServers and postListDeadServers?
> {quote}
> The duplicate code has been removed in HBASE-19131.Now Pass a list of server 
> name to postClearDeadServers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer

2017-11-22 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263872#comment-16263872
 ] 

Guanghao Zhang commented on HBASE-16868:


Ok. All ut passed. Will fix the checkstyle when commit.

> Add a replicate_all flag to avoid misuse the namespaces and table-cfs config 
> of replication peer
> 
>
> Key: HBASE-16868
> URL: https://issues.apache.org/jira/browse/HBASE-16868
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-16868.master.001.patch, 
> HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, 
> HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, 
> HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, 
> HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, 
> HBASE-16868.master.010.patch, HBASE-16868.master.011.patch
>
>
> First add a new peer by shell cmd.
> {code}
> add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase".
> {code}
> If we don't set namespaces and table cfs in peer config. It means replicate 
> all tables to the peer cluster.
> Then append a table to the peer config.
> {code}
> append_peer_tableCFs '1', {"table1" => []}
> {code}
> Then this peer will only replicate table1 to the peer cluster. It changes to 
> replicate only one table from replicate all tables in the cluster. It is very 
> easy to misuse in production cluster. So we should avoid appending table to a 
> peer which replicates all table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263871#comment-16263871
 ] 

ramkrishna.s.vasudevan commented on HBASE-16890:


I tried this out  (still on a single node cluster)
{code}
nohup ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
--presplit=50 --size=50 --columns=50 --valueSize=200 --writeToWAL=true 
--bloomFilter=NONE randomWrite 50
{code}
AsyncWAL is faster in terms of throughput (completion time).
AsyncWAL
Avg: 1103134ms
FSHLog
Avg: 1280875ms
Though we have more cols the perf seems to be better for AsyncWAL. For now I 
have only one node . I can try with multiple nodes next week or so. But 
previously this one node test was showing FSHLog was faster and now that is not 
the case is what I get. I have not digged in to the logs like I did previously 
as currently doing something else. But I will do those next week or so. 

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png, 
> ycsb_FSHlog.vs.Async.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19335) Fix waitUntilAllRegionsAssigned

2017-11-22 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263844#comment-16263844
 ] 

Appy commented on HBASE-19335:
--

Testing
{{for i in `seq 1 10`; do mvn test 
-Dtest=TestRegionObserverInterface#testRecovery -pl hbase-server; done}}
Without the patch, test failed 8 times and passed 2 times (with timeout of 60s).
With the patch, test passed 10 times. Test runtime was ~20s (per run).


> Fix waitUntilAllRegionsAssigned
> ---
>
> Key: HBASE-19335
> URL: https://issues.apache.org/jira/browse/HBASE-19335
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-19335.master.001.patch
>
>
> Found when debugging flaky test TestRegionObserverInterface#testRecovery.
> In the end, the test does the following:
> - Kills the RS
> - Waits for all regions to be assigned
> - Some validation (unrelated)
> - Cleanup: delete table.
> {noformat}
>   cluster.killRegionServer(rs1.getRegionServer().getServerName());
>   Threads.sleep(1000); // Let the kill soak in.
>   util.waitUntilAllRegionsAssigned(tableName);
>   LOG.info("All regions assigned");
>   verifyMethodResult(SimpleRegionObserver.class,
> new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", 
> "getCtPreWALRestore",
> "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" },
> tableName, new Integer[] { 1, 1, 2, 2, 0, 0 });
> } finally {
>   util.deleteTable(tableName);
>   table.close();
> }
>   }
> {noformat}
> However, looking at test logs, found that we had overlapping Assigns with 
> Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout.
> Assigns were from the ServerCrashRecovery and Unassigns were from the 
> deleteTable cleanup.
> Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) 
> not wait until recovery was complete.
> Answer: Looks like that function is only meant for sunny scenarios but not 
> for crashes. It iterates over meta and just [checks for *some value* in the 
> server 
> column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421]
>  which is obviously present and equal to the server that was just killed.
> This bug must be affecting other fault tolerance tests too and fixing it may 
> fix more than just one test, hopefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263840#comment-16263840
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
12s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
3s{color} | {color:red} hbase-server: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 3s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
54m  7s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 
24s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898980/HBASE-19290.master.004.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux df6057c46798 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / cdc2bb17ff |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9984/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9984/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263832#comment-16263832
 ] 

ramkrishna.s.vasudevan commented on HBASE-18946:


bq.Looking in the patch, are we doing assign placement inside in 
CreateTableProcedure still?
No we are not doing assign placement in create table proc. We are just 
seperating out the regions so that primary are assigned and then replicas. 
There is no state maintained in LB or in the Assign proces like in first patch.
bq. Could we pass the AM the new table regions and ask it to return us plans to 
use assigning?
I think that is what we are doing now in this patch right - we pass the regions 
and get the right server for them ensuring replicas don't sit together.

> Stochastic load balancer assigns replica regions to the same RS
> ---
>
> Key: HBASE-18946
> URL: https://issues.apache.org/jira/browse/HBASE-18946
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-3
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-18946.patch, HBASE-18946.patch, 
> HBASE-18946_2.patch, HBASE-18946_2.patch, 
> TestRegionReplicasWithRestartScenarios.java
>
>
> Trying out region replica and its assignment I can see that some times the 
> default LB Stocahstic load balancer assigns replica regions to the same RS. 
> This happens when we have 3 RS checked in and we have a table with 3 
> replicas. When a RS goes down then the replicas being assigned to same RS is 
> acceptable but the case when we have enough RS to assign this behaviour is 
> undesirable and does not solve the purpose of replicas. 
> [~huaxiang] and [~enis]. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19335) Fix waitUntilAllRegionsAssigned

2017-11-22 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-19335:
-
Status: Patch Available  (was: Open)

> Fix waitUntilAllRegionsAssigned
> ---
>
> Key: HBASE-19335
> URL: https://issues.apache.org/jira/browse/HBASE-19335
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-19335.master.001.patch
>
>
> Found when debugging flaky test TestRegionObserverInterface#testRecovery.
> In the end, the test does the following:
> - Kills the RS
> - Waits for all regions to be assigned
> - Some validation (unrelated)
> - Cleanup: delete table.
> {noformat}
>   cluster.killRegionServer(rs1.getRegionServer().getServerName());
>   Threads.sleep(1000); // Let the kill soak in.
>   util.waitUntilAllRegionsAssigned(tableName);
>   LOG.info("All regions assigned");
>   verifyMethodResult(SimpleRegionObserver.class,
> new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", 
> "getCtPreWALRestore",
> "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" },
> tableName, new Integer[] { 1, 1, 2, 2, 0, 0 });
> } finally {
>   util.deleteTable(tableName);
>   table.close();
> }
>   }
> {noformat}
> However, looking at test logs, found that we had overlapping Assigns with 
> Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout.
> Assigns were from the ServerCrashRecovery and Unassigns were from the 
> deleteTable cleanup.
> Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) 
> not wait until recovery was complete.
> Answer: Looks like that function is only meant for sunny scenarios but not 
> for crashes. It iterates over meta and just [checks for *some value* in the 
> server 
> column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421]
>  which is obviously present and equal to the server that was just killed.
> This bug must be affecting other fault tolerance tests too and fixing it may 
> fix more than just one test, hopefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19335) Fix waitUntilAllRegionsAssigned

2017-11-22 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-19335:
-
Attachment: HBASE-19335.master.001.patch

> Fix waitUntilAllRegionsAssigned
> ---
>
> Key: HBASE-19335
> URL: https://issues.apache.org/jira/browse/HBASE-19335
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-19335.master.001.patch
>
>
> Found when debugging flaky test TestRegionObserverInterface#testRecovery.
> In the end, the test does the following:
> - Kills the RS
> - Waits for all regions to be assigned
> - Some validation (unrelated)
> - Cleanup: delete table.
> {noformat}
>   cluster.killRegionServer(rs1.getRegionServer().getServerName());
>   Threads.sleep(1000); // Let the kill soak in.
>   util.waitUntilAllRegionsAssigned(tableName);
>   LOG.info("All regions assigned");
>   verifyMethodResult(SimpleRegionObserver.class,
> new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", 
> "getCtPreWALRestore",
> "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" },
> tableName, new Integer[] { 1, 1, 2, 2, 0, 0 });
> } finally {
>   util.deleteTable(tableName);
>   table.close();
> }
>   }
> {noformat}
> However, looking at test logs, found that we had overlapping Assigns with 
> Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout.
> Assigns were from the ServerCrashRecovery and Unassigns were from the 
> deleteTable cleanup.
> Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) 
> not wait until recovery was complete.
> Answer: Looks like that function is only meant for sunny scenarios but not 
> for crashes. It iterates over meta and just [checks for *some value* in the 
> server 
> column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421]
>  which is obviously present and equal to the server that was just killed.
> This bug must be affecting other fault tolerance tests too and fixing it may 
> fix more than just one test, hopefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19325) Pass a list of server name to postClearDeadServers

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263830#comment-16263830
 ] 

Hadoop QA commented on HBASE-19325:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 7s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green} branch-1 passed with JDK v1.8.0_141 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} branch-1 passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 3s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
41s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
12s{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} branch-1 passed with JDK v1.8.0_141 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} branch-1 passed with JDK v1.7.0_151 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed with JDK v1.8.0_141 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
37s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
41m  0s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed with JDK v1.8.0_141 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 26m  5s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
2s{color} | {color:green} hbase-rsgroup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| 

[jira] [Commented] (HBASE-19319) Fix bug in synchronizing over ProcedureEvent

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263828#comment-16263828
 ] 

Hadoop QA commented on HBASE-19319:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
 8s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} hbase-procedure: The patch generated 3 new + 14 
unchanged - 2 fixed = 17 total (was 16) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} hbase-server: The patch generated 0 new + 55 
unchanged - 1 fixed = 55 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
57s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
54m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
10s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 18s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19319 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898978/HBASE-19319.master.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9e9f334e579b 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |

[jira] [Commented] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263826#comment-16263826
 ] 

Hadoop QA commented on HBASE-16868:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
39s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  4m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  8m 
37s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
47s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
3s{color} | {color:red} hbase-server: The patch generated 1 new + 217 unchanged 
- 6 fixed = 218 total (was 223) {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m 
17s{color} | {color:red} The patch generated 9 new + 295 unchanged - 25 fixed = 
304 total (was 320) {color} |
| {color:red}-1{color} | {color:red} ruby-lint {color} | {color:red}  0m 
16s{color} | {color:red} The patch generated 43 new + 315 unchanged - 28 fixed 
= 358 total (was 343) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  8m 
22s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
99m  4s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
2m 20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
3s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
15s{color} | {color:green} hbase-replication in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 
27s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
39s{color} | {color:green} hbase-shell in the patch passed. {color} |
| 

[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263825#comment-16263825
 ] 

ramkrishna.s.vasudevan commented on HBASE-16890:


One quesiton on the YCSB -  so since you measure write performance - the 
batching of mutations is disabled? I think only then we can measure the correct 
latency right?

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png, 
> ycsb_FSHlog.vs.Async.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19159) Backup should check permission for snapshot copy in advance

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263819#comment-16263819
 ] 

Ted Yu commented on HBASE-19159:


I took a brief look at how hadoop tests similar scenario.
Please refer to:
hadoop-hdfs-project/hadoop-hdfs/src/test//java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestAclWithSnapshot.java
{code}
  private static void assertDirPermissionDenied(FileSystem fs,
  UserGroupInformation user, Path pathToCheck) throws Exception {
try {
  fs.listStatus(pathToCheck);
  fail("expected AccessControlException for user " + user + ", path = " +
pathToCheck);
} catch (AccessControlException e) {
{code}
See if you can borrow something from the above test.

> Backup should check permission for snapshot copy in advance
> ---
>
> Key: HBASE-19159
> URL: https://issues.apache.org/jira/browse/HBASE-19159
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Janos Gub
>Priority: Minor
> Attachments: initial_patch.txt
>
>
> When the user running the backup doesn't have permission to copy the snapshot 
> , he / she would see:
> {code}
> 2017-11-02 18:21:33,654 ERROR [main] util.AbstractHBaseTool: Error running 
> command-line tool
> org.apache.hadoop.hbase.snapshot.ExportSnapshotException: Failed to copy the 
> snapshot directory: 
> from=hdfs://ctr-e134-1499953498516-263664-01-03.hwx.site:8020/apps/hbase/data/.hbase-snapshot/snapshot_1509646891251_default_IntegrationTestBackupRestore.table2
>  
> to=hdfs://ctr-e134-1499953498516-263664-01-03.hwx.site:8020/user/root/test-data/fb919a6f-3cb4-4d57-bbcf-561d6e5b3ae8/backupIT/backup_1509646884252/default/IntegrationTestBackupRestore.table2/.hbase-snapshot/.tmp/snapshot_1509646891251_default_IntegrationTestBackupRestore.table2
>   at 
> org.apache.hadoop.hbase.snapshot.ExportSnapshot.doWork(ExportSnapshot.java:1009)
>   at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:154)
>   at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:386)
>   at 
> org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.snapshotCopy(FullTableBackupClient.java:103)
>   at 
> org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:175)
>   at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601)
>   at 
> org.apache.hadoop.hbase.IntegrationTestBackupRestore.runTest(IntegrationTestBackupRestore.java:180)
>   at 
> org.apache.hadoop.hbase.IntegrationTestBackupRestore.testBackupRestore(IntegrationTestBackupRestore.java:134)
>   at 
> org.apache.hadoop.hbase.IntegrationTestBackupRestore.runTestFromCommandLine(IntegrationTestBackupRestore.java:263)
> {code}
> It would be more user friendly if the permission is checked before taking the 
> snapshot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19092) Make Tag IA.LimitedPrivate and expose for CPs

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263817#comment-16263817
 ] 

ramkrishna.s.vasudevan commented on HBASE-19092:


Thanks for the reviews. Any other comments? I will commit it and then work on 
branch-2 patch.

> Make Tag IA.LimitedPrivate and expose for CPs
> -
>
> Key: HBASE-19092
> URL: https://issues.apache.org/jira/browse/HBASE-19092
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19092-branch-2.patch, 
> HBASE-19092-branch-2_5.patch, HBASE-19092-branch-2_5.patch, 
> HBASE-19092.branch-2.0.02.patch, HBASE-19092_001-branch-2.patch, 
> HBASE-19092_001.patch, HBASE-19092_002-branch-2.patch, HBASE-19092_002.patch, 
> HBASE-19092_004.patch, HBASE-19092_3.patch, HBASE-19092_4.patch
>
>
> We need to make tags as LimitedPrivate as some use cases are trying to use 
> tags like timeline server. The same topic was discussed in dev@ and also in 
> HBASE-18995.
> Shall we target this for beta1 - cc [~saint@gmail.com].
> So once we do this all related Util methods and APIs should also move to 
> LimitedPrivate Util classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19092) Make Tag IA.LimitedPrivate and expose for CPs

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263816#comment-16263816
 ] 

ramkrishna.s.vasudevan commented on HBASE-19092:


Ya getType should be in RawCell but that is with the actual type and not the 
byte. so I thought those are unrelated to this JIRA's description.
bq.Returning a RawCellBuilder sounds good. In it you would not allow option for 
setting sequenceid?
Yes. Don't allow seqId to be set in the CP context. 

> Make Tag IA.LimitedPrivate and expose for CPs
> -
>
> Key: HBASE-19092
> URL: https://issues.apache.org/jira/browse/HBASE-19092
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19092-branch-2.patch, 
> HBASE-19092-branch-2_5.patch, HBASE-19092-branch-2_5.patch, 
> HBASE-19092.branch-2.0.02.patch, HBASE-19092_001-branch-2.patch, 
> HBASE-19092_001.patch, HBASE-19092_002-branch-2.patch, HBASE-19092_002.patch, 
> HBASE-19092_004.patch, HBASE-19092_3.patch, HBASE-19092_4.patch
>
>
> We need to make tags as LimitedPrivate as some use cases are trying to use 
> tags like timeline server. The same topic was discussed in dev@ and also in 
> HBASE-18995.
> Shall we target this for beta1 - cc [~saint@gmail.com].
> So once we do this all related Util methods and APIs should also move to 
> LimitedPrivate Util classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19112) Suspect methods on Cell to be deprecated

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263813#comment-16263813
 ] 

ramkrishna.s.vasudevan commented on HBASE-19112:


Since there is no assignee here. can I take this up as a follow up of RawCell 
that is being added in HBASE-19092?
[~saint@gmail.com], [~chia7712]?

> Suspect methods on Cell to be deprecated
> 
>
> Key: HBASE-19112
> URL: https://issues.apache.org/jira/browse/HBASE-19112
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Reporter: Josh Elser
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
>
> [~chia7712] suggested on the [mailing 
> list|https://lists.apache.org/thread.html/e6de9af26d9b888a358ba48bf74655ccd893573087c032c0fcf01585@%3Cdev.hbase.apache.org%3E]
>  that we have some methods on Cell which should be deprecated for removal:
> * {{#getType()}}
> * {{#getTimestamp()}}
> * {{#getTag()}}
> * {{#getSequenceId()}}
> Let's make a pass over these (and maybe the rest) to make sure that there 
> aren't others which are either implementation details or methods returning 
> now-private-marked classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19221) NoClassDefFoundError: org/hamcrest/SelfDescribing while running IT tests in 2.0-alpha4

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263810#comment-16263810
 ] 

ramkrishna.s.vasudevan commented on HBASE-19221:


I don know. I need to check with the recent branch-2. But alpha4 had the issue.

> NoClassDefFoundError: org/hamcrest/SelfDescribing while running IT tests in 
> 2.0-alpha4
> --
>
> Key: HBASE-19221
> URL: https://issues.apache.org/jira/browse/HBASE-19221
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-3
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
>
> Copying the mail from the dev@
> {code}
> I tried running some IT test cases using the alpha-4 RC. I found this issue
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/hamcrest/SelfDescribing
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
> at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
> ...
>at 
> org.apache.hadoop.hbase.IntegrationTestsDriver.doWork(IntegrationTestsDriver.java:111)
> at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:154)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.hbase.IntegrationTestsDriver.main(IntegrationTestsDriver.java:47)
> The same when run against latest master it runs without any issues
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263808#comment-16263808
 ] 

Ted Yu commented on HBASE-19290:


bq. The WAL split speed was stable at 0.2TB/minute

The above is convincing.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19318) MasterRpcServices#getSecurityCapabilities explicitly checks for the HBase AccessController implementation

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263807#comment-16263807
 ] 

ramkrishna.s.vasudevan commented on HBASE-19318:


Patch LGTM.  So the intention is to see if the cluster has Accesscontrol 
services installed and if so Ranger CP will do necessary actions before the ACL 
CP is invoked, right?
On this
bq.confirmed for me that hbase:acl does somehow get created with Ranger
Is it right to do this on Ranger side?

> MasterRpcServices#getSecurityCapabilities explicitly checks for the HBase 
> AccessController implementation
> -
>
> Key: HBASE-19318
> URL: https://issues.apache.org/jira/browse/HBASE-19318
> Project: HBase
>  Issue Type: Bug
>  Components: master, security
>Reporter: Sharmadha Sainath
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 1.4.0, 1.3.2, 1.2.7, 2.0.0-beta-1
>
> Attachments: HBASE-19318.001.branch-2.patch
>
>
> Sharmadha brought a failure to my attention trying to use Ranger with HBase 
> 2.0 where the {{grant}} command was erroring out unexpectedly. The cluster 
> had the Ranger-specific coprocessors deployed, per what was previously 
> working on the HBase 1.1 line.
> After some digging, I found that the the Master is actually making a check 
> explicitly for a Coprocessor that has the name 
> {{org.apache.hadoop.hbase.security.access.AccessController}} (short name or 
> full name), instead of looking for a deployed coprocessor which can be 
> assigned to {{AccessController}} (which is what Ranger does). We have the 
> CoprocessorHost methods to do the latter already implemented; it strikes me 
> that we just accidentally used the wrong method in MasterRpcServices.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19335) Fix waitUntilAllRegionsAssigned

2017-11-22 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263803#comment-16263803
 ] 

Appy commented on HBASE-19335:
--

Running TestRegionObserverInterface on local machine took 84 sec (after the 
change).
There are ~10 tests, each with 5 min individual timeout. Too much. The test 
class is labelled MediumTests, let's used that and our standard procedure - 
category based timeout. 3 min per test function should be enough even on slower 
Apache machines. Removing individual timeouts and using CategoryBasedTimeout.

> Fix waitUntilAllRegionsAssigned
> ---
>
> Key: HBASE-19335
> URL: https://issues.apache.org/jira/browse/HBASE-19335
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
>
> Found when debugging flaky test TestRegionObserverInterface#testRecovery.
> In the end, the test does the following:
> - Kills the RS
> - Waits for all regions to be assigned
> - Some validation (unrelated)
> - Cleanup: delete table.
> {noformat}
>   cluster.killRegionServer(rs1.getRegionServer().getServerName());
>   Threads.sleep(1000); // Let the kill soak in.
>   util.waitUntilAllRegionsAssigned(tableName);
>   LOG.info("All regions assigned");
>   verifyMethodResult(SimpleRegionObserver.class,
> new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", 
> "getCtPreWALRestore",
> "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" },
> tableName, new Integer[] { 1, 1, 2, 2, 0, 0 });
> } finally {
>   util.deleteTable(tableName);
>   table.close();
> }
>   }
> {noformat}
> However, looking at test logs, found that we had overlapping Assigns with 
> Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout.
> Assigns were from the ServerCrashRecovery and Unassigns were from the 
> deleteTable cleanup.
> Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) 
> not wait until recovery was complete.
> Answer: Looks like that function is only meant for sunny scenarios but not 
> for crashes. It iterates over meta and just [checks for *some value* in the 
> server 
> column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421]
>  which is obviously present and equal to the server that was just killed.
> This bug must be affecting other fault tolerance tests too and fixing it may 
> fix more than just one test, hopefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263801#comment-16263801
 ] 

Hudson commented on HBASE-19332:


FAILURE: Integrated in Jenkins build HBase-1.4 #1024 (See 
[https://builds.apache.org/job/HBase-1.4/1024/])
HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 
c9246588ec35aca5d89db98dba2b8d1fa38dfd31)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java


> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4

2017-11-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-19300:
---
Assignee: Ted Yu
  Status: Patch Available  (was: Open)

> TestMultithreadedTableMapper fails in branch-1.4
> 
>
> Key: HBASE-19300
> URL: https://issues.apache.org/jira/browse/HBASE-19300
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 19300.branch-1.4.patch
>
>
> From 
> https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/
>  :
> {code}
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136)
> {code}
> I ran the test locally which failed.
> Noticed the following in test output:
> {code}
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. 
> But only got 0 results completely at client. Resetting the scanner to scan 
> again.
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> java.io.EOFException: Partial cell read
>   at 
> org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86)
>   at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70)
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Premature EOF from inputStream
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:202)
>   at org.apache.hadoop.hbase.KeyValueUtil.iscreate(KeyValueUtil.java:611)
>   at 
> 

[jira] [Updated] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4

2017-11-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-19300:
---
Attachment: 19300.branch-1.4.patch

This patch allows the test to pass.

> TestMultithreadedTableMapper fails in branch-1.4
> 
>
> Key: HBASE-19300
> URL: https://issues.apache.org/jira/browse/HBASE-19300
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
> Attachments: 19300.branch-1.4.patch
>
>
> From 
> https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/
>  :
> {code}
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136)
> {code}
> I ran the test locally which failed.
> Noticed the following in test output:
> {code}
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. 
> But only got 0 results completely at client. Resetting the scanner to scan 
> again.
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> java.io.EOFException: Partial cell read
>   at 
> org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86)
>   at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70)
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Premature EOF from inputStream
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:202)
>   at org.apache.hadoop.hbase.KeyValueUtil.iscreate(KeyValueUtil.java:611)
>   at 
> 

[jira] [Commented] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263796#comment-16263796
 ] 

Ted Yu commented on HBASE-19300:


As far as I can tell, synchronizing on outer (the context) is correct:
{code}
  public void run(Context context) throws IOException, InterruptedException {
outer = context;
{code}
I don't know why error-prone flagged {{synchronized (outer)}}

> TestMultithreadedTableMapper fails in branch-1.4
> 
>
> Key: HBASE-19300
> URL: https://issues.apache.org/jira/browse/HBASE-19300
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>
> From 
> https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/
>  :
> {code}
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136)
> {code}
> I ran the test locally which failed.
> Noticed the following in test output:
> {code}
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. 
> But only got 0 results completely at client. Resetting the scanner to scan 
> again.
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> java.io.EOFException: Partial cell read
>   at 
> org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86)
>   at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70)
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Premature EOF from inputStream
>   at 

[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2017-11-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263797#comment-16263797
 ] 

ramkrishna.s.vasudevan commented on HBASE-16890:


I think I need to repeat my test here. I checked the old comemnts and found one 
thing that was missing in recent tests. Increase the number of cols. will do 
that once now and report back. I think that was the difference.
The YCSB reports suggests that through put decrease though very small compared 
to FSHLog. That seems to be opposite to what I got. 

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png, 
> ycsb_FSHlog.vs.Async.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263793#comment-16263793
 ] 

Ted Yu commented on HBASE-19300:


After a brief bisect (cutting the patch in half and shrinking the number of 
files touched), I narrowed down to the changes in MultithreadedTableMapper.java.
With the changes, the test times out.

> TestMultithreadedTableMapper fails in branch-1.4
> 
>
> Key: HBASE-19300
> URL: https://issues.apache.org/jira/browse/HBASE-19300
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>
> From 
> https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/
>  :
> {code}
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136)
> {code}
> I ran the test locally which failed.
> Noticed the following in test output:
> {code}
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] 
> protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 
> results completely atclient. Resetting the scanner to scan again.
> 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. 
> But only got 0 results completely at client. Resetting the scanner to scan 
> again.
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] 
> protobuf.ResponseConverter(432): Exception while reading cells from 
> result.Resetting the scanner toscan again.
> java.io.EOFException: Partial cell read
>   at 
> org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86)
>   at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70)
>   at 
> org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Premature EOF from inputStream
>   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:202)
>   at 

[jira] [Commented] (HBASE-19317) Increase "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" to avoid host-related failures on MiniMRCluster

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263791#comment-16263791
 ] 

Hudson commented on HBASE-19317:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4101 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4101/])
HBASE-19317 Set a high NodeManager max disk utilization if not already (elserj: 
rev 6f0c9fbfd1f17f5f50d90464866d153286b051a5)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


> Increase 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  to avoid host-related failures on MiniMRCluster
> 
>
> Key: HBASE-19317
> URL: https://issues.apache.org/jira/browse/HBASE-19317
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests, test
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19317.001.branch-2.patch, 
> HBASE-19317.002.branch-2.patch
>
>
> YARN (2.7.4, at least) defaults to asserting at least 10% of the disk usage 
> free on the local machine in order for the NodeManagers to function.
> On my development machine, despite having over 50G free, I would see the 
> warning from the NM that all the local dirs were bad which would cause the 
> test to become stuck waiting to submit a mapreduce job. Surefire would 
> eventually kill the process.
> We should increase this value to avoid it causing us headache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19310) Verify IntegrationTests don't rely on Rules outside of JUnit context

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263790#comment-16263790
 ] 

Hudson commented on HBASE-19310:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4101 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4101/])
HBASE-19310 Avoid an NPE IntegrationTestImportTsv when outside of the (elserj: 
rev b0b606429339aabe9fb964af6bf3c3129b3ac375)
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java


> Verify IntegrationTests don't rely on Rules outside of JUnit context
> 
>
> Key: HBASE-19310
> URL: https://issues.apache.org/jira/browse/HBASE-19310
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19310.001.branch-2.patch, 
> HBASE-19310.002.branch-2.patch
>
>
> {noformat}
> 2017-11-16 00:43:41,204 INFO  [main] mapreduce.IntegrationTestImportTsv: 
> Running test testGenerateAndLoad.
> Exception in thread "main" java.lang.NullPointerException
>   at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:461)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.testGenerateAndLoad(IntegrationTestImportTsv.java:189)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.run(IntegrationTestImportTsv.java:229)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.main(IntegrationTestImportTsv.java:239)
> {noformat}
> (Potential line-number skew)
> {code}
>   @Test
>   public void testGenerateAndLoad() throws Exception {
> LOG.info("Running test testGenerateAndLoad.");
> final TableName table = TableName.valueOf(name.getMethodName());
> {code}
> The JUnit framework sets the test method name inside of the JUnit {{Rule}}. 
> When we invoke the test directly (ala {{hbase 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv}}), this 
> {{getMethodName()}} returns {{null}} and we get the above stacktrace.
> Should make a pass over the ITs with main methods and {{Rule}}'s to make sure 
> we don't have this lurking. Another alternative is to just remove the main 
> methods and just force use of {{IntegrationTestsDriver}} instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263792#comment-16263792
 ] 

Hudson commented on HBASE-19332:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4101 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4101/])
HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 
cdc2bb17ff38dcbd273cf501aea565006e995a06)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java


> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263773#comment-16263773
 ] 

Yu Li commented on HBASE-19290:
---

Let me try to add more information:

Once when upgrading the HDFS version, NN had some fencing problem and causing 
all RS aborted one by one. And after HDFS restored and HBase cluster restarted, 
we observed Master threads waiting on zookeeper to return:
{noformat}
"MASTER_SERVER_OPERATIONS-hdpet2mainsem2:60100-28"#2236 prio=5 os_prio=0 
tid=0x7ff526bad800 nid=0xa890 in Object.wait() [0x7ff5150f6000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
- locked <0x0005d9c720d0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1470)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:635)
at 
org.apache.hadoop.hbase.coordination.ZKSplitLogManagerCoordination.remainingTasksInCoordination(ZKSplitLogManagerCoordination.java:150)
at 
org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:353)
- locked <0x0006440826e8> (a 
org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
at 
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:274)
{noformat}

And after investigation we found the root cause is splitWAL znode contains too 
many children and the {{getChildren}} call is too time-consuming.

After some further discussion on how to resolve the issue, we think the most 
efficient way is to reduce the speed of publishing split tasks, or say only 
publish when there's available WAL splitter. Publishing the task aggressively 
could help nothing but slowing down the {{getChildren}} operation on splitWAL 
thus the whole world.

After the patched version went online, we encountered another disaster case 
(unfortunately...) and experienced no more zk contention problem. The WAL split 
speed was stable at 0.2TB/minute

So we don't have any performance testing result, but theory proved by 
observation from real world, and hope this is convincing (smile).

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19317) Increase "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" to avoid host-related failures on MiniMRCluster

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263766#comment-16263766
 ] 

Hudson commented on HBASE-19317:


FAILURE: Integrated in Jenkins build HBase-2.0 #899 (See 
[https://builds.apache.org/job/HBase-2.0/899/])
HBASE-19317 Set a high NodeManager max disk utilization if not already (elserj: 
rev 4e387a948fad9928c9d4922c9055e601d22e4145)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


> Increase 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  to avoid host-related failures on MiniMRCluster
> 
>
> Key: HBASE-19317
> URL: https://issues.apache.org/jira/browse/HBASE-19317
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests, test
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19317.001.branch-2.patch, 
> HBASE-19317.002.branch-2.patch
>
>
> YARN (2.7.4, at least) defaults to asserting at least 10% of the disk usage 
> free on the local machine in order for the NodeManagers to function.
> On my development machine, despite having over 50G free, I would see the 
> warning from the NM that all the local dirs were bad which would cause the 
> test to become stuck waiting to submit a mapreduce job. Surefire would 
> eventually kill the process.
> We should increase this value to avoid it causing us headache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19310) Verify IntegrationTests don't rely on Rules outside of JUnit context

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263765#comment-16263765
 ] 

Hudson commented on HBASE-19310:


FAILURE: Integrated in Jenkins build HBase-2.0 #899 (See 
[https://builds.apache.org/job/HBase-2.0/899/])
HBASE-19310 Avoid an NPE IntegrationTestImportTsv when outside of the (elserj: 
rev 46cb5d598689577b01cc7690587ae94579b70a11)
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java


> Verify IntegrationTests don't rely on Rules outside of JUnit context
> 
>
> Key: HBASE-19310
> URL: https://issues.apache.org/jira/browse/HBASE-19310
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19310.001.branch-2.patch, 
> HBASE-19310.002.branch-2.patch
>
>
> {noformat}
> 2017-11-16 00:43:41,204 INFO  [main] mapreduce.IntegrationTestImportTsv: 
> Running test testGenerateAndLoad.
> Exception in thread "main" java.lang.NullPointerException
>   at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:461)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.testGenerateAndLoad(IntegrationTestImportTsv.java:189)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.run(IntegrationTestImportTsv.java:229)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.main(IntegrationTestImportTsv.java:239)
> {noformat}
> (Potential line-number skew)
> {code}
>   @Test
>   public void testGenerateAndLoad() throws Exception {
> LOG.info("Running test testGenerateAndLoad.");
> final TableName table = TableName.valueOf(name.getMethodName());
> {code}
> The JUnit framework sets the test method name inside of the JUnit {{Rule}}. 
> When we invoke the test directly (ala {{hbase 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv}}), this 
> {{getMethodName()}} returns {{null}} and we get the above stacktrace.
> Should make a pass over the ITs with main methods and {{Rule}}'s to make sure 
> we don't have this lurking. Another alternative is to just remove the main 
> methods and just force use of {{IntegrationTestsDriver}} instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263767#comment-16263767
 ] 

Hudson commented on HBASE-19332:


FAILURE: Integrated in Jenkins build HBase-2.0 #899 (See 
[https://builds.apache.org/job/HBase-2.0/899/])
HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 
135bb5583b44f207e20f2e2caf0d109903f817d4)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java


> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263760#comment-16263760
 ] 

binlijin commented on HBASE-19290:
--

{quote}
The patch is adding throttling, so it's certainly changing the way things work. 
It's also adding the 'if' condition controlling throttling, so the choice is 
definitely being made by you.
I suspect you are using grabbedTask=0 as a proxy for 'failed to grab task' and 
wait on it. But when grabbedTask =1, and we still keep failing to grab tasks, 
there is no throttling for that case! Hopefully that makes my question clearer?
{quote}
But when grabbedTask =1, and we still keep failing to grab tasks, it will end 
the for loop and enter while (seq_start == taskReadySeq.get()) {} loop,  do 
this have any problem? 

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263753#comment-16263753
 ] 

Hudson commented on HBASE-19332:


FAILURE: Integrated in Jenkins build HBase-1.5 #166 (See 
[https://builds.apache.org/job/HBase-1.5/166/])
HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 
20d811121fb38ea2fc3871dcf4b03593bd4d6b7e)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java


> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19325) Pass a list of server name to postClearDeadServers

2017-11-22 Thread Guangxu Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangxu Cheng updated HBASE-19325:
--
Attachment: HBASE-19325.branch-1.001.patch

 upload branch-1 patch

> Pass a list of server name to postClearDeadServers
> --
>
> Key: HBASE-19325
> URL: https://issues.apache.org/jira/browse/HBASE-19325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
> Attachments: HBASE-19325.branch-1.001.patch, 
> HBASE-19325.branch-2.001.patch
>
>
> Over on the tail of HBASE-18131. [~chia7712] said 
> {quote}
> (Revisiting the AccessController remind me of this issue) 
> Could we remove the duplicate code on the server side? Why not pass a list of 
> server name to postClearDeadServers and postListDeadServers?
> {quote}
> The duplicate code has been removed in HBASE-19131.Now Pass a list of server 
> name to postClearDeadServers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263748#comment-16263748
 ] 

binlijin edited comment on HBASE-19290 at 11/23/17 3:27 AM:


bq. The above example lasted for almost an hour. 
bq.  With the patch, roughly how long does log splitting task last ?
We do not record the new numbers and the log do not exists now. But we record 
that we have 7.1TB wals and split it in 40mins.



was (Author: aoxiang):
bq. The above example lasted for almost an hour. 
  With the patch, roughly how long does log splitting task last ?
We do not record the new numbers and the log do not exists now. But we record 
that we have 7.1TB wals and split it in 40mins.


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263748#comment-16263748
 ] 

binlijin commented on HBASE-19290:
--

bq. The above example lasted for almost an hour. 
  With the patch, roughly how long does log splitting task last ?
We do not record the new numbers and the log do not exists now. But we record 
that we have 7.1TB wals and split it in 40mins.


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18309) Support multi threads in CleanerChore

2017-11-22 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-18309:
--
 Hadoop Flags: Reviewed
 Release Note: After HBASE-18309 we could use multiple threads to scan 
archive directories (including data and oldWALs) through config 
hbase.cleaner.scan.dir.concurrent.size, which supports both integer (meaning 
the concrete size) and double (between 0 and 1, meaning ratio of available cpu 
cores) value and defaults to 0.5. Please take 
hbase.regionserver.hfilecleaner.large.thread.count and 
hbase.regionserver.hfilecleaner.small.thread.count into account when setting 
this config to avoid thread flooding. We also support using multiple threads to 
clean wals in a single directory through hbase.oldwals.cleaner.thread.size, 2 
by default.
Fix Version/s: 2.0.0-beta-1
   3.0.0
  Description: 
There is only one thread in LogCleaner to clean oldWALs and in our big cluster 
we find this is not enough. The number of files under oldWALs reach the 
max-directory-items limit of HDFS and cause region server crash, so we use 
multi threads for LogCleaner and the crash not happened any more.

What's more, currently there's only one thread iterating the archive directory, 
and we could use multiple threads cleaning sub directories in parallel to speed 
it up.

  was:There is only one thread in LogCleaner to clean oldWALs and in our big 
cluster we find this is not enough. The number of files under oldWALs reach the 
max-directory-items limit of HDFS and cause region server crash, so we use 
multi threads for LogCleaner and the crash not happened any more.

  Component/s: (was: wal)

[~reidchan] please check the release note and feel free to refine it. It's 
recommended to add release note when introducing new properties, so people 
could better know how to use them.

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: Reid Chan
> Fix For: 3.0.0, 2.0.0-beta-1
>
> Attachments: HBASE-18309.master.001.patch, 
> HBASE-18309.master.002.patch, HBASE-18309.master.004.patch, 
> HBASE-18309.master.005.patch, HBASE-18309.master.006.patch, 
> HBASE-18309.master.007.patch, HBASE-18309.master.008.patch, 
> HBASE-18309.master.009.patch, HBASE-18309.master.010.patch, 
> HBASE-18309.master.011.patch, HBASE-18309.master.012.patch, 
> space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263742#comment-16263742
 ] 

Ted Yu commented on HBASE-19290:


Lijin:
The above example lasted for almost an hour.

With the patch, roughly how long does log splitting task last ?

Thanks

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19093) Check Admin/Table to ensure all operations go via AccessControl

2017-11-22 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263741#comment-16263741
 ] 

Anoop Sam John commented on HBASE-19093:


bq.If we add a new method to MasterRpcServices, but don't add pre/post methods 
to MasterObserver. So it will still miss the ACL check?
Good point.  Wanted to come to this jira and check attached patch but missed in 
btw some thing else.  I have a doubt on the general approach.   The issue is 
when we add new client functions (say adding Quota things), there is chances 
that we miss the ACL checks. It is not normally seen like hook are added around 
the ops but missed impl in AC. Infact most of the time the AC is the prompting 
factor for adding hooks. We cleaned up some hooks recently which were exposing 
too many internal stuff to CPs (Around procedure, locks) . All those hooks were 
designed so as to do some AC checks.  So the problem is mostly the other way 
around compared to what the patch is trying to do.  Not sure how we can add a 
test for that.

> Check Admin/Table to ensure all operations go via AccessControl
> ---
>
> Key: HBASE-19093
> URL: https://issues.apache.org/jira/browse/HBASE-19093
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Balazs Meszaros
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19093.master.001.patch, 
> HBASE-19093.master.002.patch, RegionObserver.txt
>
>
> A cursory review of Admin Interface has a bunch of methods as open, with out 
> AccessControl checks. For example, procedure executor has not check on it.
> This issue is about given the Admin and Table Interfaces a once-over to see 
> what is missing and to fill in access control where missing.
> This is a follow-on from work over in HBASE-19048



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2017-11-22 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263738#comment-16263738
 ] 

Duo Zhang commented on HBASE-16890:
---

The ops metrics are 81435 vs. 77108, FSHLog is 5% more but the run time is 
almost the same? The unit is milliseconds, so for a 10 mins run the diff is 
only 4ms while there is 5% ops differ?

And the seconds is even stranger, 393 vs. 458, AsyncFSWAL is 16% more but the 
run is still, only 3s for a 10 mins run, 0.5% differ?

Could you please explain more on the results?

Thanks.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png, 
> ycsb_FSHlog.vs.Async.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2017-11-22 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263728#comment-16263728
 ] 

Anoop Sam John commented on HBASE-16890:


So as per ur tests the higher percentile latency values are better with async 
wal. But the avg latency and the throughput is on bit lower side.  The test Ram 
did was also around throughput only.  Initial time it was like async wal 
throughput on lower side.  Later his tests did not say so. I dont know the 
details of these later tests.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png, 
> ycsb_FSHlog.vs.Async.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263729#comment-16263729
 ] 

binlijin commented on HBASE-19290:
--

We observe that when the cluster is big, regionserver issue too much zookeeper 
request to get availableRSs from rsZNode and also getTaskList from 
splitLogZNode. Have more nodes, the get availableRSs is more heavy.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore

2017-11-22 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263723#comment-16263723
 ] 

Yu Li commented on HBASE-18309:
---

Ok, let me commit this one.

[~stack] I'm planning to commit this one into branch-2 too if you don't mind 
boss. Thanks.

> Support multi threads in CleanerChore
> -
>
> Key: HBASE-18309
> URL: https://issues.apache.org/jira/browse/HBASE-18309
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: binlijin
>Assignee: Reid Chan
> Attachments: HBASE-18309.master.001.patch, 
> HBASE-18309.master.002.patch, HBASE-18309.master.004.patch, 
> HBASE-18309.master.005.patch, HBASE-18309.master.006.patch, 
> HBASE-18309.master.007.patch, HBASE-18309.master.008.patch, 
> HBASE-18309.master.009.patch, HBASE-18309.master.010.patch, 
> HBASE-18309.master.011.patch, HBASE-18309.master.012.patch, 
> space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263720#comment-16263720
 ] 

binlijin commented on HBASE-19290:
--

[~tedyu]
bq. Assuming patch v3 is very close to the version you run in the 2000+ node 
production cluster, can you post some performance numbers (in terms of 
reduction in zookeeper requests) so that we can know its effectiveness ?

We do not record the performance numbers.
But without the patch we can see the HMaster get the zookeeper event very very 
slowly...

HMaster put up split task:

*2017-07-11 20:22:57,608* DEBUG [main-EventThread] 
coordination.SplitLogManagerCoordination: put up splitlog task at znode 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

RegionServer grab the task and done it.

*2017-07-11 20:23:33,689* INFO  [SplitLogWorker-hadoop1435:16020] 
coordination.ZkSplitLogWorkerCoordination: worker 
hadoop1435.et2.tbsite.net,16020,1495647366458 acquired task 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

*2017-07-11 20:25:47,131* INFO  [RS_LOG_REPLAY_OPS-hadoop1435:16020-1] 
coordination.ZkSplitLogWorkerCoordination: successfully transitioned task 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548
 to final state DONE hadoop1435.et2.tbsite.net,16020,1495647366458

HMaster get the task done event and delete it:

*2017-07-11 20:49:52,879* INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548
 entered state: DONE hadoop1435.et2.tbsite.net,16020,1495647366458

*2017-07-11 20:49:52,881* INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: Done splitting 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

*2017-07-11 21:19:52,280* DEBUG [main-EventThread] 
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection

2017-11-22 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-19334:
--

Assignee: Guanghao Zhang

> User.runAsLoginUser not work in AccessController because it use a short 
> circuited connection
> 
>
> Key: HBASE-19334
> URL: https://issues.apache.org/jira/browse/HBASE-19334
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>
> The short-circuited connection will bypass the RPC and the RPC context didn't 
> change. So it still use the old RPC user to write ACL table and 
> User.runAsLoginUser not work.
> AccessController's grant method.
> {code}
> User.runAsLoginUser(new PrivilegedExceptionAction() {
>   @Override
>   public Void run() throws Exception {
> // regionEnv is set at #start. Hopefully not null at this point.
> try (Table table = regionEnv.getConnection().
> getTable(AccessControlLists.ACL_TABLE_NAME)) {
>   
> AccessControlLists.addUserPermission(regionEnv.getConfiguration(), perm, 
> table,
>   request.getMergeExistingPermissions());
> }
> return null;
>   }
> });
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19093) Check Admin/Table to ensure all operations go via AccessControl

2017-11-22 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263719#comment-16263719
 ] 

Guanghao Zhang commented on HBASE-19093:


If we add a new method to MasterRpcServices, but don't add pre/post methods to 
MasterObserver. So it will still miss the ACL check?

> Check Admin/Table to ensure all operations go via AccessControl
> ---
>
> Key: HBASE-19093
> URL: https://issues.apache.org/jira/browse/HBASE-19093
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Balazs Meszaros
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19093.master.001.patch, 
> HBASE-19093.master.002.patch, RegionObserver.txt
>
>
> A cursory review of Admin Interface has a bunch of methods as open, with out 
> AccessControl checks. For example, procedure executor has not check on it.
> This issue is about given the Admin and Table Interfaces a once-over to see 
> what is missing and to fill in access control where missing.
> This is a follow-on from work over in HBASE-19048



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263710#comment-16263710
 ] 

Hudson commented on HBASE-19332:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #295 (See 
[https://builds.apache.org/job/HBase-1.3-IT/295/])
HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 
276052be990e439d74d9d7871e1242dd1b1c8de7)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java


> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263709#comment-16263709
 ] 

Hudson commented on HBASE-19332:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #355 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/355/])
HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 
276052be990e439d74d9d7871e1242dd1b1c8de7)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java


> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263707#comment-16263707
 ] 

Hudson commented on HBASE-19332:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #375 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/375/])
HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 
276052be990e439d74d9d7871e1242dd1b1c8de7)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java


> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-19332:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to branch-1.3+.  Thanks for review [~tedyu].

> DumpReplicationQueues misreports total WAL size
> ---
>
> Key: HBASE-19332
> URL: https://issues.apache.org/jira/browse/HBASE-19332
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Trivial
> Fix For: 2.0.0, 3.0.0, 1.3.2
>
> Attachments: HBASE-19332.patch
>
>
> DumpReplicationQueues uses an int to collect the total WAL size for a queue.  
> Predictably, this overflows much of the time.  Let's use a long instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263699#comment-16263699
 ] 

binlijin edited comment on HBASE-19290 at 11/23/17 2:32 AM:


[~appy]
bq. Yeah, they are definitely clear, but Jingyun Tian's suggestion to improve 
the code further was appropriate and made sense.
bq. Here's the diff on what he was suggesting (and what i was thinking earlier).
Done in HBASE-19290.master.004.patch .



was (Author: aoxiang):
[~appy]
bq. Yeah, they are definitely clear, but Jingyun Tian's suggestion to improve 
the code further was appropriate and made sense.
Here's the diff on what he was suggesting (and what i was thinking earlier).
Done in HBASE-19290.master.004.patch .


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19335) Fix waitUntilAllRegionsAssigned

2017-11-22 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263701#comment-16263701
 ] 

Appy commented on HBASE-19335:
--

I have a patch which i'll upload shortly.

> Fix waitUntilAllRegionsAssigned
> ---
>
> Key: HBASE-19335
> URL: https://issues.apache.org/jira/browse/HBASE-19335
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
>
> Found when debugging flaky test TestRegionObserverInterface#testRecovery.
> In the end, the test does the following:
> - Kills the RS
> - Waits for all regions to be assigned
> - Some validation (unrelated)
> - Cleanup: delete table.
> {noformat}
>   cluster.killRegionServer(rs1.getRegionServer().getServerName());
>   Threads.sleep(1000); // Let the kill soak in.
>   util.waitUntilAllRegionsAssigned(tableName);
>   LOG.info("All regions assigned");
>   verifyMethodResult(SimpleRegionObserver.class,
> new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", 
> "getCtPreWALRestore",
> "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" },
> tableName, new Integer[] { 1, 1, 2, 2, 0, 0 });
> } finally {
>   util.deleteTable(tableName);
>   table.close();
> }
>   }
> {noformat}
> However, looking at test logs, found that we had overlapping Assigns with 
> Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout.
> Assigns were from the ServerCrashRecovery and Unassigns were from the 
> deleteTable cleanup.
> Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) 
> not wait until recovery was complete.
> Answer: Looks like that function is only meant for sunny scenarios but not 
> for crashes. It iterates over meta and just [checks for *some value* in the 
> server 
> column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421]
>  which is obviously present and equal to the server that was just killed.
> This bug must be affecting other fault tolerance tests too and fixing it may 
> fix more than just one test, hopefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19335) Fix waitUntilAllRegionsAssigned

2017-11-22 Thread Appy (JIRA)
Appy created HBASE-19335:


 Summary: Fix waitUntilAllRegionsAssigned
 Key: HBASE-19335
 URL: https://issues.apache.org/jira/browse/HBASE-19335
 Project: HBase
  Issue Type: Bug
Reporter: Appy
Assignee: Appy


Found when debugging flaky test TestRegionObserverInterface#testRecovery.
In the end, the test does the following:
- Kills the RS
- Waits for all regions to be assigned
- Some validation (unrelated)
- Cleanup: delete table.
{noformat}
  cluster.killRegionServer(rs1.getRegionServer().getServerName());
  Threads.sleep(1000); // Let the kill soak in.
  util.waitUntilAllRegionsAssigned(tableName);
  LOG.info("All regions assigned");

  verifyMethodResult(SimpleRegionObserver.class,
new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", 
"getCtPreWALRestore",
"getCtPostWALRestore", "getCtPrePut", "getCtPostPut" },
tableName, new Integer[] { 1, 1, 2, 2, 0, 0 });
} finally {
  util.deleteTable(tableName);
  table.close();
}
  }
{noformat}

However, looking at test logs, found that we had overlapping Assigns with 
Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout.
Assigns were from the ServerCrashRecovery and Unassigns were from the 
deleteTable cleanup.
Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) 
not wait until recovery was complete.

Answer: Looks like that function is only meant for sunny scenarios but not for 
crashes. It iterates over meta and just [checks for *some value* in the server 
column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421]
 which is obviously present and equal to the server that was just killed.

This bug must be affecting other fault tolerance tests too and fixing it may 
fix more than just one test, hopefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-19290:
-
Attachment: HBASE-19290.master.004.patch

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263699#comment-16263699
 ] 

binlijin commented on HBASE-19290:
--

[~appy]
bq. Yeah, they are definitely clear, but Jingyun Tian's suggestion to improve 
the code further was appropriate and made sense.
Here's the diff on what he was suggesting (and what i was thinking earlier).
Done in HBASE-19290.master.004.patch .


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19035) Miss metrics when coprocessor use region scanner to read data

2017-11-22 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263689#comment-16263689
 ] 

Guanghao Zhang commented on HBASE-19035:


The branch-1 HADOOP QA build timeout See 
https://builds.apache.org/job/PreCommit-HBASE-Build/9963/console
It take 6 hours to run the hbase-server test, then timeout.

09:19:11 cd /testptch/hbase/hbase-server
09:19:11 mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hbase-branch-1-patch-1 
-DHBasePatchProcess -PrunAllTests 
-Dtest.exclude.pattern=**/TestClassFinder.java,**/client.TestGet.java,**/master.cleaner.TestReplicationZKNodeCleaner.java,**/snapshot.TestExportSnapshot.java,**/master.TestAssignmentManagerMetrics.java,**/client.TestShell.java,**/master.assignment.TestAssignmentManager.java,**/master.assignment.TestMergeTableRegionsProcedure.java,**/client.TestAsyncTableGetMultiThreaded.java,**/security.visibility.TestVisibilityLabelsOnNewVersionBehaviorTable.java,**/master.balancer.TestFavoredStochasticLoadBalancer.java,**/client.TestBlockEvictionFromClient.java,**/security.access.TestCoprocessorWhitelistMasterObserver.java,**/master.TestRollingRestart.java,**/client.TestTableSnapshotScanner.java,**/client.TestAsyncTableScanAll.java,**/rsgroup.TestRSGroups.java,**/quotas.TestMasterSpaceQuotaObserver.java,**/replication.TestReplicationKillSlaveRS.java,**/replication.TestReplicationDroppedTables.java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.java,**/snapshot.TestMobRestoreFlushSnapshotFromClient.java,**/client.locking.TestEntityLocks.java,**/client.TestScannersFromClientSide.java,**/quotas.TestSpaceQuotasWithSnapshots.java,**/client.TestMobSnapshotCloneIndependence.java,**/client.TestReplicaWithCluster.java,**/quotas.TestQuotaAdmin.java,**/TestCheckTestClasses.java,**/master.procedure.TestEnableTableProcedure.java,**/regionserver.TestSplitTransactionOnCluster.java,**/client.TestMultiParallel.java,**/client.TestSizeFailures.java,**/client.TestRestoreSnapshotFromClientWithRegionReplicas.java,**/client.TestAdmin2.java,**/regionserver.TestHRegion.java,**/master.procedure.TestTruncateTableProcedure.java,**/security.visibility.TestVisibilityLabelsWithACL.java,**/master.TestWarmupRegion.java,**/snapshot.TestSecureExportSnapshot.java,**/io.encoding.TestLoadAndSwitchEncodeOnDisk.java,**/master.procedure.TestServerCrashProcedure.java,**/client.replication.TestReplicationAdminWithClusters.java,**/client.TestHCM.java,**/client.replication.TestReplicationAdminWithTwoDifferentZKClusters.java,**/TestJMXListener.java,**/trace.TestHTraceHooks.java,**/replication.TestReplicationSyncUpTool.java,**/client.TestMultiRespectsLimits.java,**/regionserver.TestCompactionInDeadRegionServer.java,**/client.TestAsyncTableAdminApi.java,**/snapshot.TestMobSecureExportSnapshot.java,**/replication.TestMasterReplication.java,**/client.TestAsyncSnapshotAdminApi.java,**/master.assignment.TestAssignmentOnRSCrash.java,**/regionserver.wal.TestAsyncLogRolling.java,**/replication.TestReplicationSmallTests.java,**/snapshot.TestMobFlushSnapshotFromClient.java,**/quotas.TestSnapshotQuotaObserverChore.java,**/TestAcidGuarantees.java,**/master.assignment.TestSplitTableRegionProcedure.java,**/replication.regionserver.TestTableBasedReplicationSourceManagerImpl.java,**/TestZooKeeper.java,**/fs.TestBlockReorder.java,**/client.TestCloneSnapshotFromClient.java,**/security.token.TestTokenAuthentication.java,**/coprocessor.TestRegionObserverInterface.java,**/regionserver.TestFSErrorsExposed.java,**/client.TestMetaWithReplicas.java,**/client.TestFromClientSideWithCoprocessor.java,**/master.TestDistributedLogSplitting.java,**/TestServerSideScanMetricsFromClientSide.java,**/regionserver.TestPerColumnFamilyFlush.java,**/client.TestMobCloneSnapshotFromClient.java,**/TestRegionRebalancing.java,**/security.visibility.TestVisibilityLabelsWithDeletes.java,**/master.procedure.TestMasterFailoverWithProcedures.java,**/master.cleaner.TestHFileCleaner.java
 clean test -fae > /testptch/patchprocess/patch-unit-hbase-server.txt 2>&1
15:30:49 Build timed out (after 420 minutes). Marking the build as failed.
15:30:50 Build was aborted

> Miss metrics when coprocessor use region scanner to read data
> -
>
> Key: HBASE-19035
> URL: https://issues.apache.org/jira/browse/HBASE-19035
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19035.branch-1.001.patch, 
> HBASE-19035.branch-1.patch, HBASE-19035.branch-1.patch, 
> HBASE-19035.branch-1.patch, HBASE-19035.branch-1.patch, 
> HBASE-19035.master.001.patch, HBASE-19035.master.002.patch, 
> HBASE-19035.master.003.patch, HBASE-19035.master.003.patch
>
>
> Region interface is exposed to coprocessor. So coprocessor use getScanner to 
> get a 

[jira] [Updated] (HBASE-19319) Fix bug in synchronizing over ProcedureEvent

2017-11-22 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-19319:
-
Attachment: HBASE-19319.master.002.patch

> Fix bug in synchronizing over ProcedureEvent
> 
>
> Key: HBASE-19319
> URL: https://issues.apache.org/jira/browse/HBASE-19319
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
> Attachments: HBASE-19319.master.001.patch, 
> HBASE-19319.master.002.patch
>
>
> Following synchronizes over local variable rather than the original 
> ProcedureEvent object. Clearly a bug since this code block won't follow 
> exclusion with many of the synchronized methods in ProcedureEvent class.
> {code}
>  @Override
>   public void wakeEvents(final int count, final ProcedureEvent... events) {
> final boolean traceEnabled = LOG.isTraceEnabled();
> schedLock();
> try {
>   int waitingCount = 0;
>   for (int i = 0; i < count; ++i) {
> final ProcedureEvent event = events[i];
> synchronized (event) {
>   if (!event.isReady()) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19325) Pass a list of server name to postClearDeadServers

2017-11-22 Thread Guangxu Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263652#comment-16263652
 ] 

Guangxu Cheng commented on HBASE-19325:
---

bq.Don't we need to preserve / provide the same method signature ?
I will submit the branch-1 patch later.Thanks


> Pass a list of server name to postClearDeadServers
> --
>
> Key: HBASE-19325
> URL: https://issues.apache.org/jira/browse/HBASE-19325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
> Attachments: HBASE-19325.branch-2.001.patch
>
>
> Over on the tail of HBASE-18131. [~chia7712] said 
> {quote}
> (Revisiting the AccessController remind me of this issue) 
> Could we remove the duplicate code on the server side? Why not pass a list of 
> server name to postClearDeadServers and postListDeadServers?
> {quote}
> The duplicate code has been removed in HBASE-19131.Now Pass a list of server 
> name to postClearDeadServers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19325) Pass a list of server name to postClearDeadServers

2017-11-22 Thread Guangxu Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263651#comment-16263651
 ] 

Guangxu Cheng commented on HBASE-19325:
---

bq.What about passing the servers coming from request to preClearDeadServers?
preClearDeadServers is called only in AccessController, and does not use the 
variable deadservers. So, I do not think it is necessary to pass dead servers 
to preClearDeadServers.WDYT?Thanks

> Pass a list of server name to postClearDeadServers
> --
>
> Key: HBASE-19325
> URL: https://issues.apache.org/jira/browse/HBASE-19325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
> Attachments: HBASE-19325.branch-2.001.patch
>
>
> Over on the tail of HBASE-18131. [~chia7712] said 
> {quote}
> (Revisiting the AccessController remind me of this issue) 
> Could we remove the duplicate code on the server side? Why not pass a list of 
> server name to postClearDeadServers and postListDeadServers?
> {quote}
> The duplicate code has been removed in HBASE-19131.Now Pass a list of server 
> name to postClearDeadServers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19330) Remove duplicated dependency from hbase-rest

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263629#comment-16263629
 ] 

Hudson commented on HBASE-19330:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4100 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4100/])
HBASE-19330 Remove duplicated dependency from hbase-rest (stack: rev 
548ebbc574021ca22ba92633678d4f0cec70be0d)
* (edit) hbase-rest/pom.xml


> Remove duplicated dependency from hbase-rest
> 
>
> Key: HBASE-19330
> URL: https://issues.apache.org/jira/browse/HBASE-19330
> Project: HBase
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 3.0.0, 2.0.0-alpha-4
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Trivial
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19330.master.001.patch, 
> HBASE-19330.master.001.patch
>
>
> In hbase-rest module hbase-hadoop-compat dependency is listed twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19318) MasterRpcServices#getSecurityCapabilities explicitly checks for the HBase AccessController implementation

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263605#comment-16263605
 ] 

Hadoop QA commented on HBASE-19318:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
14s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
50s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
9s{color} | {color:red} hbase-server: The patch generated 2 new + 16 unchanged 
- 0 fixed = 18 total (was 16) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
31s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
46m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m  
6s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db |
| JIRA Issue | HBASE-19318 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898946/HBASE-19318.001.branch-2.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 7a6e54b327de 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2 / 0ef7a24245 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9981/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9981/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9981/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was 

[jira] [Commented] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer

2017-11-22 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263602#comment-16263602
 ] 

Guanghao Zhang commented on HBASE-16868:


TestAsyncReplicationAdminApiWithClusters was added recently. Attach a 011 patch 
to fix it.

> Add a replicate_all flag to avoid misuse the namespaces and table-cfs config 
> of replication peer
> 
>
> Key: HBASE-16868
> URL: https://issues.apache.org/jira/browse/HBASE-16868
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-16868.master.001.patch, 
> HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, 
> HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, 
> HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, 
> HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, 
> HBASE-16868.master.010.patch, HBASE-16868.master.011.patch
>
>
> First add a new peer by shell cmd.
> {code}
> add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase".
> {code}
> If we don't set namespaces and table cfs in peer config. It means replicate 
> all tables to the peer cluster.
> Then append a table to the peer config.
> {code}
> append_peer_tableCFs '1', {"table1" => []}
> {code}
> Then this peer will only replicate table1 to the peer cluster. It changes to 
> replicate only one table from replicate all tables in the cluster. It is very 
> easy to misuse in production cluster. So we should avoid appending table to a 
> peer which replicates all table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer

2017-11-22 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16868:
---
Attachment: HBASE-16868.master.011.patch

> Add a replicate_all flag to avoid misuse the namespaces and table-cfs config 
> of replication peer
> 
>
> Key: HBASE-16868
> URL: https://issues.apache.org/jira/browse/HBASE-16868
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-16868.master.001.patch, 
> HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, 
> HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, 
> HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, 
> HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, 
> HBASE-16868.master.010.patch, HBASE-16868.master.011.patch
>
>
> First add a new peer by shell cmd.
> {code}
> add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase".
> {code}
> If we don't set namespaces and table cfs in peer config. It means replicate 
> all tables to the peer cluster.
> Then append a table to the peer config.
> {code}
> append_peer_tableCFs '1', {"table1" => []}
> {code}
> Then this peer will only replicate table1 to the peer cluster. It changes to 
> replicate only one table from replicate all tables in the cluster. It is very 
> easy to misuse in production cluster. So we should avoid appending table to a 
> peer which replicates all table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263593#comment-16263593
 ] 

Ted Yu commented on HBASE-19333:


bq. Make sure it does not come off as 'instruction' or a 'command'.

I read description again and don't which part constitutes a command.

> Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
> 
>
> Key: HBASE-19333
> URL: https://issues.apache.org/jira/browse/HBASE-19333
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> In the thread, 
> http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3
>  , Timothy mentioned that he used reflection to get to 
> ExportSnapshot#getSnapshotFiles().
> {code}
>   private static List> getSnapshotFiles(final 
> Configuration conf,
>   final FileSystem fs, final Path snapshotDir) throws IOException {
> {code}
> SnapshotFileInfo is protobuf.
> We should consider exposing the API by replacing the protobuf class with POJO 
> class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations

2017-11-22 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263578#comment-16263578
 ] 

Guanghao Zhang commented on HBASE-19301:


bq. we need to add doc that it will be a short-circuited connection
Yes, doc it clearly will be better. :-)

> Provide way for CPs to create short circuited connection with custom 
> configurations
> ---
>
> Key: HBASE-19301
> URL: https://issues.apache.org/jira/browse/HBASE-19301
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, 
> HBASE-19301_V2.patch
>
>
> Over in HBASE-18359 we have discussions for this.
> Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But 
> this returns a pre created connection (per server).  This uses the configs at 
> hbase-site.xml at that server. 
> Phoenix needs creating connection in CP with some custom configs. Having this 
> custom changes in hbase-site.xml is harmful as that will affect all 
> connections been created at that server.
> This issue is for providing an overloaded getConnection(Configuration) API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations

2017-11-22 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263576#comment-16263576
 ] 

Guanghao Zhang commented on HBASE-19301:


Open a HBASE-19334 for the ACL problem. The ACL problem is a old problem which 
I left a TODO for it. I misunderstood this issue and thought this may resolve 
the ACL problem... So I left comment here. Now we can continue discuss in 
HBASE-19334.

> Provide way for CPs to create short circuited connection with custom 
> configurations
> ---
>
> Key: HBASE-19301
> URL: https://issues.apache.org/jira/browse/HBASE-19301
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, 
> HBASE-19301_V2.patch
>
>
> Over in HBASE-18359 we have discussions for this.
> Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But 
> this returns a pre created connection (per server).  This uses the configs at 
> hbase-site.xml at that server. 
> Phoenix needs creating connection in CP with some custom configs. Having this 
> custom changes in hbase-site.xml is harmful as that will affect all 
> connections been created at that server.
> This issue is for providing an overloaded getConnection(Configuration) API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations

2017-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263569#comment-16263569
 ] 

stack commented on HBASE-19301:
---

[~zghaobac] Ok. Thanks. Your issue that we need to add doc that it will be a 
short-circuited connection and that if you do not want that, you need to create 
your own outside of the CpEnv offerings is a good comment.

> Provide way for CPs to create short circuited connection with custom 
> configurations
> ---
>
> Key: HBASE-19301
> URL: https://issues.apache.org/jira/browse/HBASE-19301
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, 
> HBASE-19301_V2.patch
>
>
> Over in HBASE-18359 we have discussions for this.
> Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But 
> this returns a pre created connection (per server).  This uses the configs at 
> hbase-site.xml at that server. 
> Phoenix needs creating connection in CP with some custom configs. Having this 
> custom changes in hbase-site.xml is harmful as that will affect all 
> connections been created at that server.
> This issue is for providing an overloaded getConnection(Configuration) API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19323) Make netty engine default in hbase2

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263564#comment-16263564
 ] 

Hadoop QA commented on HBASE-19323:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  7m 
16s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
14s{color} | {color:red} hbase-server: The patch generated 1 new + 8 unchanged 
- 0 fixed = 9 total (was 8) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 9s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
69m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}124m 
44s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}218m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19323 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898932/HBASE-19323.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux dc093b52dabb 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 194efe3e5a |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9980/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9980/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9980/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically 

[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class

2017-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263562#comment-16263562
 ] 

stack commented on HBASE-19333:
---

bq. HBASE-15762 Consider hbase-client to be shaded by default in 2.0

Yeah, bad idea there too.

bq. I looked for Tim's handle on JIRA. The first one was from 
fellowshipvillage. The second one has different spelling in last name.

You do it on the mailing list out in public so those addressed (or those 
watching) understand that a key attributes of our community, ones we'd like to 
talk-up, are encouraged participation and inclusion, and that there is no need 
for a mediator filing or fixing issues in our project. Be careful too how you 
make your suggestion. Make sure it does not come off as 'instruction' or a 
'command'.

bq. I just felt using reflection is not something that would be accepted in the 
open source version.

If a proper tool, would not need to expose files.

> Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
> 
>
> Key: HBASE-19333
> URL: https://issues.apache.org/jira/browse/HBASE-19333
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> In the thread, 
> http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3
>  , Timothy mentioned that he used reflection to get to 
> ExportSnapshot#getSnapshotFiles().
> {code}
>   private static List> getSnapshotFiles(final 
> Configuration conf,
>   final FileSystem fs, final Path snapshotDir) throws IOException {
> {code}
> SnapshotFileInfo is protobuf.
> We should consider exposing the API by replacing the protobuf class with POJO 
> class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263549#comment-16263549
 ] 

Ted Yu commented on HBASE-19333:


bq. Why file on behalf of others?

I looked for Tim's handle on JIRA. The first one was from fellowshipvillage. 
The second one has different spelling in last name.

bq. Why not encourage them to file their own issues?

I have no problem changing reporter to Tim. Will request the person to log JIRA 
in the future. I will also remind people who don't follow this practice.

bq. the problem is a cleaning tool

True. Open sourcing the tool is another matter. I just felt using reflection is 
not something that would be accepted in the open source version.

> Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
> 
>
> Key: HBASE-19333
> URL: https://issues.apache.org/jira/browse/HBASE-19333
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> In the thread, 
> http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3
>  , Timothy mentioned that he used reflection to get to 
> ExportSnapshot#getSnapshotFiles().
> {code}
>   private static List> getSnapshotFiles(final 
> Configuration conf,
>   final FileSystem fs, final Path snapshotDir) throws IOException {
> {code}
> SnapshotFileInfo is protobuf.
> We should consider exposing the API by replacing the protobuf class with POJO 
> class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection

2017-11-22 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-19334:
---
Description: 
The short-circuited connection will bypass the RPC and the RPC context didn't 
change. So it still use the old RPC user to write ACL table and 
User.runAsLoginUser not work.

AccessController's grant method.
{code}
User.runAsLoginUser(new PrivilegedExceptionAction() {
  @Override
  public Void run() throws Exception {
// regionEnv is set at #start. Hopefully not null at this point.
try (Table table = regionEnv.getConnection().
getTable(AccessControlLists.ACL_TABLE_NAME)) {
  
AccessControlLists.addUserPermission(regionEnv.getConfiguration(), perm, table,
  request.getMergeExistingPermissions());
}
return null;
  }
});
{code}

  was:The short-circuited connection will bypass the RPC and the RPC context 
didn't change. So it still use the old RPC user to write ACL table and 
User.runAsLoginUser not work.


> User.runAsLoginUser not work in AccessController because it use a short 
> circuited connection
> 
>
> Key: HBASE-19334
> URL: https://issues.apache.org/jira/browse/HBASE-19334
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>
> The short-circuited connection will bypass the RPC and the RPC context didn't 
> change. So it still use the old RPC user to write ACL table and 
> User.runAsLoginUser not work.
> AccessController's grant method.
> {code}
> User.runAsLoginUser(new PrivilegedExceptionAction() {
>   @Override
>   public Void run() throws Exception {
> // regionEnv is set at #start. Hopefully not null at this point.
> try (Table table = regionEnv.getConnection().
> getTable(AccessControlLists.ACL_TABLE_NAME)) {
>   
> AccessControlLists.addUserPermission(regionEnv.getConfiguration(), perm, 
> table,
>   request.getMergeExistingPermissions());
> }
> return null;
>   }
> });
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19330) Remove duplicated dependency from hbase-rest

2017-11-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263546#comment-16263546
 ] 

Hudson commented on HBASE-19330:


FAILURE: Integrated in Jenkins build HBase-2.0 #898 (See 
[https://builds.apache.org/job/HBase-2.0/898/])
HBASE-19330 Remove duplicated dependency from hbase-rest (stack: rev 
0ef7a24245359ada37473b05e6cca4aad46b5225)
* (edit) hbase-rest/pom.xml


> Remove duplicated dependency from hbase-rest
> 
>
> Key: HBASE-19330
> URL: https://issues.apache.org/jira/browse/HBASE-19330
> Project: HBase
>  Issue Type: Bug
>  Components: dependencies
>Affects Versions: 3.0.0, 2.0.0-alpha-4
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Trivial
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19330.master.001.patch, 
> HBASE-19330.master.001.patch
>
>
> In hbase-rest module hbase-hadoop-compat dependency is listed twice.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263543#comment-16263543
 ] 

Ted Yu commented on HBASE-19333:


bq. And whats with the 'Consider'? Why file issue an issue for a 
'Consideration'?

I am confused. There have been precedents. e.g.

HBASE-15762 Consider hbase-client to be shaded by default in 2.0

> Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
> 
>
> Key: HBASE-19333
> URL: https://issues.apache.org/jira/browse/HBASE-19333
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> In the thread, 
> http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3
>  , Timothy mentioned that he used reflection to get to 
> ExportSnapshot#getSnapshotFiles().
> {code}
>   private static List> getSnapshotFiles(final 
> Configuration conf,
>   final FileSystem fs, final Path snapshotDir) throws IOException {
> {code}
> SnapshotFileInfo is protobuf.
> We should consider exposing the API by replacing the protobuf class with POJO 
> class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection

2017-11-22 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-19334:
---
Description: The short-circuited connection will bypass the RPC and the RPC 
context didn't change. So it still use the old RPC user to write ACL table and 
User.runAsLoginUser not work.

> User.runAsLoginUser not work in AccessController because it use a short 
> circuited connection
> 
>
> Key: HBASE-19334
> URL: https://issues.apache.org/jira/browse/HBASE-19334
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>
> The short-circuited connection will bypass the RPC and the RPC context didn't 
> change. So it still use the old RPC user to write ACL table and 
> User.runAsLoginUser not work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection

2017-11-22 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-19334:
--

 Summary: User.runAsLoginUser not work in AccessController because 
it use a short circuited connection
 Key: HBASE-19334
 URL: https://issues.apache.org/jira/browse/HBASE-19334
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations

2017-11-22 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263536#comment-16263536
 ] 

Guanghao Zhang commented on HBASE-19301:


Sorry, sir. This issue is great to resolve what the subjects said. I comment 
here because I want get some feedback from [~anoop.hbase]. I should open a new 
issue to discuss the ACL problem... I think this can be resolve later.


> Provide way for CPs to create short circuited connection with custom 
> configurations
> ---
>
> Key: HBASE-19301
> URL: https://issues.apache.org/jira/browse/HBASE-19301
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, 
> HBASE-19301_V2.patch
>
>
> Over in HBASE-18359 we have discussions for this.
> Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But 
> this returns a pre created connection (per server).  This uses the configs at 
> hbase-site.xml at that server. 
> Phoenix needs creating connection in CP with some custom configs. Having this 
> custom changes in hbase-site.xml is harmful as that will affect all 
> connections been created at that server.
> This issue is for providing an overloaded getConnection(Configuration) API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2017-11-22 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263521#comment-16263521
 ] 

Duo Zhang commented on HBASE-16890:
---

BTW what do you mean by ‘completely asynchronous’? 

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png, 
> ycsb_FSHlog.vs.Async.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2017-11-22 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263513#comment-16263513
 ] 

Duo Zhang commented on HBASE-16890:
---

MVCC is assigned before calling consumer. It is an optimization which is done 
by [~carp84] . And what is your config for the ycsb test?

Thanks.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png, 
> ycsb_FSHlog.vs.Async.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-19329) hbase regionserver log output error (quota)

2017-11-22 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-19329.

Resolution: Invalid

Please use the user mailing list for help in debugging your system. JIRA is not 
the place for this.

https://hbase.apache.org/mail-lists.html

> hbase  regionserver log output error (quota)
> 
>
> Key: HBASE-19329
> URL: https://issues.apache.org/jira/browse/HBASE-19329
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: gehaijiang
>
> 2017-11-16 02:50:33,474 WARN  
> [blackstone064030,16020,1510632966258_ChoreService_1] quotas.QuotaCache: 
> Unable to read user from quota table
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 3 
> actions: Table 'hbase:quota' was not found, got: hbase:namespace.: 3 times, 
> servers with issues: null
> ,
>   at 
> org.apache.hadoop.hbase.quotas.QuotaTableUtil.doGet(QuotaTableUtil.java:330)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaUtil.fetchUserQuotas(QuotaUtil.java:155)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore$3.fetchEntries(QuotaCache.java:256)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetch(QuotaCache.java:290)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetchUserQuotaState(QuotaCache.java:248)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:213)
> 2017-11-16 02:55:33,453 WARN  
> [blackstone064030,16020,1510632966258_ChoreService_1] quotas.QuotaCache: 
> Unable to read namespace from quota table
> org.apache.hadoop.hbase.TableNotFoundException: Table 'hbase:quota' was not 
> found, got: hbase:namespace.
>   at 
> org.apache.hadoop.hbase.quotas.QuotaTableUtil.doGet(QuotaTableUtil.java:330)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaUtil.fetchGlobalQuotas(QuotaUtil.java:220)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaUtil.fetchNamespaceQuotas(QuotaUtil.java:207)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore$1.fetchEntries(QuotaCache.java:226)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetch(QuotaCache.java:290)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetchNamespaceQuotaState(QuotaCache.java:218)
>   at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:211)
> 2017-11-16 02:55:33,488 WARN  
> [blackstone064030,16020,1510632966258_ChoreService_1] quotas.QuotaCache: 
> Unable to read table from quota table
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
> 47 actions: Table 'hbase:quota' was not found, got: hbase:namespace.: 47 
> times, servers with issues: nu
> ll,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class

2017-11-22 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-19333.
---
Resolution: Incomplete

Resolving as incomplete/invalid.

> Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
> 
>
> Key: HBASE-19333
> URL: https://issues.apache.org/jira/browse/HBASE-19333
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> In the thread, 
> http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3
>  , Timothy mentioned that he used reflection to get to 
> ExportSnapshot#getSnapshotFiles().
> {code}
>   private static List> getSnapshotFiles(final 
> Configuration conf,
>   final FileSystem fs, final Path snapshotDir) throws IOException {
> {code}
> SnapshotFileInfo is protobuf.
> We should consider exposing the API by replacing the protobuf class with POJO 
> class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class

2017-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263488#comment-16263488
 ] 

stack commented on HBASE-19333:
---

Why file on behalf of others? Why not encourage them to file their own issues? 
Besides, the problem is a cleaning tool... Not " exposing 
ExportSnapshot#getSnapshotFiles through POJO class". And whats with the 
'Consider'? Why file issue an issue for a 'Consideration'? Filing issues should 
be more than 'Considerations'.

in fact, let me close this as ill-specified.

> Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
> 
>
> Key: HBASE-19333
> URL: https://issues.apache.org/jira/browse/HBASE-19333
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> In the thread, 
> http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3
>  , Timothy mentioned that he used reflection to get to 
> ExportSnapshot#getSnapshotFiles().
> {code}
>   private static List> getSnapshotFiles(final 
> Configuration conf,
>   final FileSystem fs, final Path snapshotDir) throws IOException {
> {code}
> SnapshotFileInfo is protobuf.
> We should consider exposing the API by replacing the protobuf class with POJO 
> class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19266) TestAcidGuarantees should cover adaptive in-memory compaction

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263474#comment-16263474
 ] 

Ted Yu commented on HBASE-19266:


TestAcidGuaranteesWithBasicPoli didn't finish in the QA run.
This might be due to test environment.

> TestAcidGuarantees should cover adaptive in-memory compaction
> -
>
> Key: HBASE-19266
> URL: https://issues.apache.org/jira/browse/HBASE-19266
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Attachments: HBASE-19266.v0.patch
>
>
> Currently TestAcidGuarantees populates 3 policies of (in-memory) compaction.
> Adaptive in-memory compaction is new and should be added as 4th compaction 
> policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19317) Increase "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" to avoid host-related failures on MiniMRCluster

2017-11-22 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-19317:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks, Ted.

> Increase 
> "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"
>  to avoid host-related failures on MiniMRCluster
> 
>
> Key: HBASE-19317
> URL: https://issues.apache.org/jira/browse/HBASE-19317
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests, test
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19317.001.branch-2.patch, 
> HBASE-19317.002.branch-2.patch
>
>
> YARN (2.7.4, at least) defaults to asserting at least 10% of the disk usage 
> free on the local machine in order for the NodeManagers to function.
> On my development machine, despite having over 50G free, I would see the 
> warning from the NM that all the local dirs were bad which would cause the 
> test to become stuck waiting to submit a mapreduce job. Surefire would 
> eventually kill the process.
> We should increase this value to avoid it causing us headache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19310) Verify IntegrationTests don't rely on Rules outside of JUnit context

2017-11-22 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-19310:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks Ted and Stack.

> Verify IntegrationTests don't rely on Rules outside of JUnit context
> 
>
> Key: HBASE-19310
> URL: https://issues.apache.org/jira/browse/HBASE-19310
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19310.001.branch-2.patch, 
> HBASE-19310.002.branch-2.patch
>
>
> {noformat}
> 2017-11-16 00:43:41,204 INFO  [main] mapreduce.IntegrationTestImportTsv: 
> Running test testGenerateAndLoad.
> Exception in thread "main" java.lang.NullPointerException
>   at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:461)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.testGenerateAndLoad(IntegrationTestImportTsv.java:189)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.run(IntegrationTestImportTsv.java:229)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.main(IntegrationTestImportTsv.java:239)
> {noformat}
> (Potential line-number skew)
> {code}
>   @Test
>   public void testGenerateAndLoad() throws Exception {
> LOG.info("Running test testGenerateAndLoad.");
> final TableName table = TableName.valueOf(name.getMethodName());
> {code}
> The JUnit framework sets the test method name inside of the JUnit {{Rule}}. 
> When we invoke the test directly (ala {{hbase 
> org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv}}), this 
> {{getMethodName()}} returns {{null}} and we get the above stacktrace.
> Should make a pass over the ITs with main methods and {{Rule}}'s to make sure 
> we don't have this lurking. Another alternative is to just remove the main 
> methods and just force use of {{IntegrationTestsDriver}} instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing

2017-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263451#comment-16263451
 ] 

stack commented on HBASE-19163:
---

Man. Great find [~huaxiang] Thats bad.  Critical yes. Did you get chance to 
file issue on it sir?

On patch, LGTM. Nice. Was wondering about this bit:

5652} catch (Error error) {
5653  // The maximum lock count for read lock is 64K (hardcoded), when 
this maximum count
5654  // is reached, it will throw out an Error. This Error needs to be 
caught so it can
5655  // go ahead to process the minibatch with lock acquired.
5656  IOException ioe = new IOException();
5657  ioe.initCause(error);
5658  TraceUtil.addTimelineAnnotation("Error getting row lock");
5659  throw ioe;

The Error could be anything. It could be > 64k locks. It could be an OOME. I 
suppose no harm catching it and trying to press on persisting the batch. Do we 
log that entered this clause? Might be useful to add if not when trying to 
debug an odd issue. If we saw this log a few lines up, we'd know we were 
already in dire straits. Thanks.


> "Maximum lock count exceeded" from region server's batch processing
> ---
>
> Key: HBASE-19163
> URL: https://issues.apache.org/jira/browse/HBASE-19163
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-19163-master-v001.patch, 
> HBASE-19163.master.001.patch, HBASE-19163.master.002.patch, 
> HBASE-19163.master.004.patch, HBASE-19163.master.005.patch, 
> HBASE-19163.master.006.patch, unittest-case.diff
>
>
> In one of use cases, we found the following exception and replication is 
> stuck.
> {code}
> 2017-10-25 19:41:17,199 WARN  [hconnection-0x28db294f-shared--pool4-t936] 
> client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last 
> exception: java.io.IOException: java.io.IOException: Maximum lock count 
> exceeded
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.Error: Maximum lock count exceeded
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
> ... 3 more
> {code}
> While we are still examining the data pattern, it is sure that there are too 
> many mutations in the batch against the same row, this exceeds the maximum 
> 64k shared lock count and it throws an error and failed the whole batch.
> There are two approaches to solve this issue.
> 1). Let's say there are mutations against the same row in the batch, we just 
> need to acquire the lock once for the same row vs to acquire the lock for 
> each mutation.
> 2). We catch the error and start to process whatever it gets and loop back.
> With HBASE-17924, approach 1 seems easy to implement now. 
> Create the jira and will post update/patch when investigation moving forward.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263450#comment-16263450
 ] 

Hadoop QA commented on HBASE-19332:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 9s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
23s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
47m 48s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m  
6s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19332 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898929/HBASE-19332.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 91b396108c96 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 194efe3e5a |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9979/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9979/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> DumpReplicationQueues misreports total WAL size
> 

[jira] [Created] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class

2017-11-22 Thread Ted Yu (JIRA)
Ted Yu created HBASE-19333:
--

 Summary: Consider exposing ExportSnapshot#getSnapshotFiles through 
POJO class
 Key: HBASE-19333
 URL: https://issues.apache.org/jira/browse/HBASE-19333
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu


In the thread, 
http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3
 , Timothy mentioned that he used reflection to get to 
ExportSnapshot#getSnapshotFiles().
{code}
  private static List> getSnapshotFiles(final 
Configuration conf,
  final FileSystem fs, final Path snapshotDir) throws IOException {
{code}
SnapshotFileInfo is protobuf.

We should consider exposing the API by replacing the protobuf class with POJO 
class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19204) branch-1.2 times out and is taking 6-7 hours to complete

2017-11-22 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263422#comment-16263422
 ] 

Xiao Chen commented on HBASE-19204:
---

Hi Stack,

The surefire tarball was what I could consistently reproducing the jvm issue, 
with the provided command. My reproduction was in a docker container running 
ubuntu (1404 iirc), but I think it should reproduce in any env with 7u151 
openjdk.

Not sure what the fix (or the exact jvm bug) is besides '7u161 doesn't have 
it!' :)

> branch-1.2 times out and is taking 6-7 hours to complete
> 
>
> Key: HBASE-19204
> URL: https://issues.apache.org/jira/browse/HBASE-19204
> Project: HBase
>  Issue Type: Umbrella
>  Components: test
>Reporter: stack
>
> Sean has been looking at tooling and infra. This Umbrellas is about looking 
> at actual tests. For example, running locally on dedicated machine I picked a 
> random test, TestPerColumnFamilyFlush. In my test run, it wrote 16M lines. It 
> seems to be having zk issues but it is catching interrupts and ignoring them 
> ([~carp84] fixed this in later versions over in HBASE-18441).
> Let me try and do some fixup under this umbrella so we can get a 1.2.7 out 
> the door.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19204) branch-1.2 times out and is taking 6-7 hours to complete

2017-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263407#comment-16263407
 ] 

stack commented on HBASE-19204:
---

Thanks [~xiaochen] for coming by w/ helpful input. I should bundle the 
surefirebooter*.jar up into our Docker container?




> branch-1.2 times out and is taking 6-7 hours to complete
> 
>
> Key: HBASE-19204
> URL: https://issues.apache.org/jira/browse/HBASE-19204
> Project: HBase
>  Issue Type: Umbrella
>  Components: test
>Reporter: stack
>
> Sean has been looking at tooling and infra. This Umbrellas is about looking 
> at actual tests. For example, running locally on dedicated machine I picked a 
> random test, TestPerColumnFamilyFlush. In my test run, it wrote 16M lines. It 
> seems to be having zk issues but it is catching interrupts and ignoring them 
> ([~carp84] fixed this in later versions over in HBASE-18441).
> Let me try and do some fixup under this umbrella so we can get a 1.2.7 out 
> the door.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   >