[jira] [Updated] (HBASE-16033) Add more details in logging of responseTooSlow/TooLarge

2016-11-06 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-16033:
--
Affects Version/s: (was: 1.2.1)
   1.2.3
   1.1.7
Fix Version/s: 1.1.8
   1.2.4

Just picked this up in branch-1.1/1.2

> Add more details in logging of responseTooSlow/TooLarge
> ---
>
> Key: HBASE-16033
> URL: https://issues.apache.org/jira/browse/HBASE-16033
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21, 1.2.4, 1.1.8
>
> Attachments: HBASE-16033.patch, HBASE-16033.patch, HBASE-16033.patch
>
>
> Currently the log message when responseTooSlow/TooLarge is like:
> {noformat}
> 2016-06-08 12:18:04,363 WARN  
> [B.defaultRpcServer.handler=127,queue=10,port=16020]
> ipc.RpcServer: (responseTooSlow): 
> {"processingtimems":13125,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)",
> "client":"11.251.158.22:36331","starttimems":1465359471238,"queuetimems":1540116,
> "class":"HRegionServer","responsesize":17,"method":"Multi"}
> {noformat}
> which is kind of helpless for debugging since we don't know on which 
> table/region/row the request is against.
> What's more, we could see some if-else check in the {{RpcServer#logResponse}} 
> method which trying to do sth different when the {{param}} includes instance 
> of {{Operation}}, but there's only one place invoking {{logResponse}} and the 
> {{param}} is always an instance of {{Message}}. Checking the change history, 
> I believe this is a left-over cleanup in work of HBASE-8214 
> We will address the above issues, do some cleanup and improve the log just 
> like {{RpcServer$Call#toString}} does to include table/region/row information 
> of the request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16972) Log more details for Scan#next request when responseTooSlow

2016-11-06 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-16972:
--
Affects Version/s: 1.2.3
   1.1.7
Fix Version/s: 1.1.8
   1.2.4
   1.4.0
   2.0.0

> Log more details for Scan#next request when responseTooSlow
> ---
>
> Key: HBASE-16972
> URL: https://issues.apache.org/jira/browse/HBASE-16972
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, 
> HBASE-16972.v3.patch
>
>
> Currently for if responseTooSlow happens on the scan.next call, we will get 
> warn log like below:
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"}
> {noformat}
> From which we only have a {{scanner_id}} and impossible to know what exactly 
> this scan is about, like against which region of which table.
> After this JIRA, we will improve the message to something like below (notice 
> the last line):
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster",
> "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16972) Log more details for Scan#next request when responseTooSlow

2016-11-06 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-16972:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Done pushing into all 1.1+ branches except for 1.3 (HBASE-17011 will do the 
backport for 1.3.1). Thanks all for review.

> Log more details for Scan#next request when responseTooSlow
> ---
>
> Key: HBASE-16972
> URL: https://issues.apache.org/jira/browse/HBASE-16972
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, 
> HBASE-16972.v3.patch
>
>
> Currently for if responseTooSlow happens on the scan.next call, we will get 
> warn log like below:
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"}
> {noformat}
> From which we only have a {{scanner_id}} and impossible to know what exactly 
> this scan is about, like against which region of which table.
> After this JIRA, we will improve the message to something like below (notice 
> the last line):
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster",
> "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17034) HTable#exist and HTable#existAll are flaky

2016-11-06 Thread ChiaPing Tsai (JIRA)
ChiaPing Tsai created HBASE-17034:
-

 Summary: HTable#exist and HTable#existAll are flaky
 Key: HBASE-17034
 URL: https://issues.apache.org/jira/browse/HBASE-17034
 Project: HBase
  Issue Type: Bug
Reporter: ChiaPing Tsai
Priority: Minor



# HTable#exist apply the default consistency, but HTable#existAll dosen’t
# HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
(HTable#get(Get, boolean) clones the passed Get)

So we have some issues outlined below.
# Can the passed Get be modified? If so, we can save the clone of Get. If not, 
the HTable#getScanner() modify the passed Scan for some unset variables, so it 
is ok to modify the passed Get I guess. 
# Should we assign the default Consistency to the passed Get which has an null 
value of Consistency?
# Should the HTable#exist be implemented by HTable#existAll? If so, it may be 
duplicated  to [HBASE-16953|https://issues.apache.org/jira/browse/HBASE-16593]. 
If not, it seems to me that we should unify all exist methods by AP. :)

Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17034) HTable#exist and HTable#existAll are flaky

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17034:
--
Description: 
# HTable#exist apply the default consistency, but HTable#existAll dosen’t
# HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
(HTable#get(Get, boolean) clones the passed Get)

So we have some issues outlined below.
# Can the passed Get be modified? If so, we can save the clone of Get. If not, 
the HTable#getScanner() modify the passed Scan for some unset variables, so it 
is ok to modify the passed Get I guess. 
# Should we assign the default Consistency to the passed Get which has an null 
value of Consistency?
# Should the HTable#exist be implemented by HTable#existAll? If so, it may be 
duplicated  to [HBASE-16623|https://issues.apache.org/jira/browse/HBASE-16623]. 
If not, it seems to me that we should unify all exist methods by AP. :)

Any comment? Thanks.

  was:

# HTable#exist apply the default consistency, but HTable#existAll dosen’t
# HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
(HTable#get(Get, boolean) clones the passed Get)

So we have some issues outlined below.
# Can the passed Get be modified? If so, we can save the clone of Get. If not, 
the HTable#getScanner() modify the passed Scan for some unset variables, so it 
is ok to modify the passed Get I guess. 
# Should we assign the default Consistency to the passed Get which has an null 
value of Consistency?
# Should the HTable#exist be implemented by HTable#existAll? If so, it may be 
duplicated  to [HBASE-16953|https://issues.apache.org/jira/browse/HBASE-16593]. 
If not, it seems to me that we should unify all exist methods by AP. :)

Any comment? Thanks.


> HTable#exist and HTable#existAll are flaky
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Bug
>Reporter: ChiaPing Tsai
>Priority: Minor
>
> # HTable#exist apply the default consistency, but HTable#existAll dosen’t
> # HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
> (HTable#get(Get, boolean) clones the passed Get)
> So we have some issues outlined below.
> # Can the passed Get be modified? If so, we can save the clone of Get. If 
> not, the HTable#getScanner() modify the passed Scan for some unset variables, 
> so it is ok to modify the passed Get I guess. 
> # Should we assign the default Consistency to the passed Get which has an 
> null value of Consistency?
> # Should the HTable#exist be implemented by HTable#existAll? If so, it may be 
> duplicated  to 
> [HBASE-16623|https://issues.apache.org/jira/browse/HBASE-16623]. If not, it 
> seems to me that we should unify all exist methods by AP. :)
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17034) HTable#exist and HTable#existAll are flaky

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17034:
--
Description: 
# HTable#exist apply the default consistency, but HTable#existAll dosen’t
# HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
(HTable#get(Get, boolean) clones the passed Get)

So we have some issues outlined below.
# Can the passed Get be modified? If so, we can save the clone of Get. If not, 
the HTable#getScanner() modify the passed Scan for some unset variables, so it 
is ok to modify the passed Get I guess. 
# Should we assign the default Consistency to the passed Get which has an null 
value of Consistency?

This jira may be solved by 
[HBASE-16623|https://issues.apache.org/jira/browse/HBASE-16623].

Any comment? Thanks.

  was:
# HTable#exist apply the default consistency, but HTable#existAll dosen’t
# HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
(HTable#get(Get, boolean) clones the passed Get)

So we have some issues outlined below.
# Can the passed Get be modified? If so, we can save the clone of Get. If not, 
the HTable#getScanner() modify the passed Scan for some unset variables, so it 
is ok to modify the passed Get I guess. 
# Should we assign the default Consistency to the passed Get which has an null 
value of Consistency?
# Should the HTable#exist be implemented by HTable#existAll? If so, it may be 
duplicated  to [HBASE-16623|https://issues.apache.org/jira/browse/HBASE-16623]. 
If not, it seems to me that we should unify all exist methods by AP. :)

Any comment? Thanks.


> HTable#exist and HTable#existAll are flaky
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Bug
>Reporter: ChiaPing Tsai
>Priority: Minor
>
> # HTable#exist apply the default consistency, but HTable#existAll dosen’t
> # HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
> (HTable#get(Get, boolean) clones the passed Get)
> So we have some issues outlined below.
> # Can the passed Get be modified? If so, we can save the clone of Get. If 
> not, the HTable#getScanner() modify the passed Scan for some unset variables, 
> so it is ok to modify the passed Get I guess. 
> # Should we assign the default Consistency to the passed Get which has an 
> null value of Consistency?
> This jira may be solved by 
> [HBASE-16623|https://issues.apache.org/jira/browse/HBASE-16623].
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16575) unify the semantic of RRCI#callWithRetries and RRCI#callWithoutRetries when the maxAttempts is configured to one

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-16575:
--
Issue Type: Improvement  (was: Bug)

> unify the semantic of RRCI#callWithRetries and RRCI#callWithoutRetries when 
> the maxAttempts is configured to one
> 
>
> Key: HBASE-16575
> URL: https://issues.apache.org/jira/browse/HBASE-16575
> Project: HBase
>  Issue Type: Improvement
>Reporter: ChiaPing Tsai
>Priority: Minor
>
> It seems to me that RRCI#callWithRetries and RRCI#callWithoutRetries should 
> have the same logic if the maxAttempts is configured to one. But there are 
> some difference are shown below:
> 1) timeout
> 2) failure handle
> The quick solution is that we always call the RRCI#callWithRetries in the 
> RRCI#callWithoutRetries when the maxAttempts is configured to one.
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17035) Check why we roll a wal writer at 10MB when the configured roll size is 120M+ with AsyncFSWAL

2016-11-06 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-17035:
-

 Summary: Check why we roll a wal writer at 10MB when the 
configured roll size is 120M+ with AsyncFSWAL
 Key: HBASE-17035
 URL: https://issues.apache.org/jira/browse/HBASE-17035
 Project: HBase
  Issue Type: Sub-task
  Components: wal
Affects Versions: 2.0.0
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: 2.0.0


Found this when addressing HBASE-16890. It is one of the possible reason that 
why AsyncFSWAL performs worse than FSHLog when running PE tool.

https://issues.apache.org/jira/browse/HBASE-16890?focusedCommentId=15636688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15636688



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16972) Log more details for Scan#next request when responseTooSlow

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15641845#comment-15641845
 ] 

Hudson commented on HBASE-16972:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #64 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/64/])
HBASE-16972 Log more details for Scan#next request when responseTooSlow (liyu: 
rev aeef1661c731bc4139509ecf926fcd8207b33049)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> Log more details for Scan#next request when responseTooSlow
> ---
>
> Key: HBASE-16972
> URL: https://issues.apache.org/jira/browse/HBASE-16972
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, 
> HBASE-16972.v3.patch
>
>
> Currently for if responseTooSlow happens on the scan.next call, we will get 
> warn log like below:
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"}
> {noformat}
> From which we only have a {{scanner_id}} and impossible to know what exactly 
> this scan is about, like against which region of which table.
> After this JIRA, we will improve the message to something like below (notice 
> the last line):
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster",
> "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16033) Add more details in logging of responseTooSlow/TooLarge

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15641844#comment-15641844
 ] 

Hudson commented on HBASE-16033:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #64 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/64/])
HBASE-16033 Add more details in logging of responseTooSlow/TooLarge (liyu: rev 
b206809d330bdbb048c472c9809663d074052c3e)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Add more details in logging of responseTooSlow/TooLarge
> ---
>
> Key: HBASE-16033
> URL: https://issues.apache.org/jira/browse/HBASE-16033
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21, 1.2.4, 1.1.8
>
> Attachments: HBASE-16033.patch, HBASE-16033.patch, HBASE-16033.patch
>
>
> Currently the log message when responseTooSlow/TooLarge is like:
> {noformat}
> 2016-06-08 12:18:04,363 WARN  
> [B.defaultRpcServer.handler=127,queue=10,port=16020]
> ipc.RpcServer: (responseTooSlow): 
> {"processingtimems":13125,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)",
> "client":"11.251.158.22:36331","starttimems":1465359471238,"queuetimems":1540116,
> "class":"HRegionServer","responsesize":17,"method":"Multi"}
> {noformat}
> which is kind of helpless for debugging since we don't know on which 
> table/region/row the request is against.
> What's more, we could see some if-else check in the {{RpcServer#logResponse}} 
> method which trying to do sth different when the {{param}} includes instance 
> of {{Operation}}, but there's only one place invoking {{logResponse}} and the 
> {{param}} is always an instance of {{Message}}. Checking the change history, 
> I believe this is a left-over cleanup in work of HBASE-8214 
> We will address the above issues, do some cleanup and improve the log just 
> like {{RpcServer$Call#toString}} does to include table/region/row information 
> of the request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636688#comment-15636688
 ] 

Duo Zhang edited comment on HBASE-16890 at 11/6/16 2:08 PM:


Ah I could also observe the same result with a larger data set. FSHLog is 
faster. And I think I found the direct reason.

The command is
{noformat}
./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
--presplit=50 --size=50 --columns=50 --valueSize=200 --writeToWAL=true 
--bloomFilter=NONE randomWrite 50
{noformat}

When running with FSHLog, we have flushed 702 times and the average flush size 
is 85.3227MB. And for the AsyncFSWAL with my patch, we have flushed 850 times, 
and the average flush size is 71.609MB.

Usually the flush is triggered by log roller because of too many WAL files.
{noformat}
2016-11-04 22:28:18,925 INFO  
[regionserver/c4-hadoop-build01.bj/10.132.4.49:16020.logRoller] 
wal.AbstractFSWAL: Too many WALs; count=33, max=32; forcing flush of 6 
regions(s): 7f18ef6867d0a36627930da34818069f, 7fdd29e6e2e6be
34b2ea97c9a06281d0, d12c296bd1cb70b2ce78e9a3bc914318, 
9207d10a0f22877079d3896d6cb6ebb2, d2b6ac38e6edf675225a71748fb1274e, 
ad371a623567b35a784256e4f05c5f3a
{noformat}

And for FSHLog, we have rolled 491 times, the average roll size is 130.329MB.
And for AsyncFSWAL with my patch, we have rolled 584 times, the average roll 
size is 109.666MB.

In general, the roll size of FSHLog is a little larger than AsyncFSWAL(which 
means the AsyncFSWAL is a little faster when rolling?). But I think the main 
reason is that, for AsyncFSWAL, there are 78 times at which we roll the wal 
with a size far away from the configured roll size. The configured roll size is 
120M+ but we actually roll the wal writer when the file size is between 
10MB-20MB. I think this is the problem why we perform so bad in PE. Need to 
find out where the abnormal rolling comes from.

Have you guys observe the same stuffs? [~ram_krish] [~anoopsamjohn].

Will go out for two days this weekend. Will come back to dig next Monday.

Thanks.


was (Author: apache9):
Ah I could also observe the same result with a larger data set. FSHLog is 
faster. And I think I found the direct reason.

The command is
{noformat}
./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred 
--presplit=50 --size=50 --columns=50 --valueSize=200 --writeToWAL=true 
--bloomFilter=NONE randomWrite 50
{noformat}

When running with FSHLog, we have flushed 702 times and the average flush size 
is 85.3227MB. And for the AsyncFSWAL with my patch, we have flushed 850 times, 
and the average flush size is 71.609MB.

Usually the flush is triggered by log roller because of too many WAL files.
{noformat}
2016-11-04 22:28:18,925 INFO  
[regionserver/c4-hadoop-build01.bj/10.132.4.49:16020.logRoller] 
wal.AbstractFSWAL: Too many WALs; count=33, max=32; forcing flush of 6 
regions(s): 7f18ef6867d0a36627930da34818069f, 7fdd29e6e2e6be
34b2ea97c9a06281d0, d12c296bd1cb70b2ce78e9a3bc914318, 
9207d10a0f22877079d3896d6cb6ebb2, d2b6ac38e6edf675225a71748fb1274e, 
ad371a623567b35a784256e4f05c5f3a
{noformat}

And for FSHLog, we have rolled 491 times, the average roll size is 130.329MB.
And for AsyncFSWAL with my patch, we have rolled 584 times, the average roll 
size is 109.666MB.

In general, the roll size of FSHLog is little larger than AsyncFSWAL(which 
means the AsyncFSWAL is a little faster when rolling?). But I think the main 
reason is that, for AsyncFSWAL, there are 78 times at which we roll with size 
far away from the roll size, between 10MB-20MB. I think this is the problem why 
we perform so bad in PE. Need to find out where the abnormal rolling comes from.

Have you guys observe the same stuffs? [~ram_krish] [~anoopsamjohn].

Will go out for two days this weekend. Will come back to dig next Monday.

Thanks.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contenti

[jira] [Commented] (HBASE-16033) Add more details in logging of responseTooSlow/TooLarge

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15641984#comment-15641984
 ] 

Hudson commented on HBASE-16033:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1897 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1897/])
HBASE-16033 Add more details in logging of responseTooSlow/TooLarge (liyu: rev 
14f5f1c1912e200926d61e9a9c9db246a1380fb7)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Add more details in logging of responseTooSlow/TooLarge
> ---
>
> Key: HBASE-16033
> URL: https://issues.apache.org/jira/browse/HBASE-16033
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21, 1.2.4, 1.1.8
>
> Attachments: HBASE-16033.patch, HBASE-16033.patch, HBASE-16033.patch
>
>
> Currently the log message when responseTooSlow/TooLarge is like:
> {noformat}
> 2016-06-08 12:18:04,363 WARN  
> [B.defaultRpcServer.handler=127,queue=10,port=16020]
> ipc.RpcServer: (responseTooSlow): 
> {"processingtimems":13125,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)",
> "client":"11.251.158.22:36331","starttimems":1465359471238,"queuetimems":1540116,
> "class":"HRegionServer","responsesize":17,"method":"Multi"}
> {noformat}
> which is kind of helpless for debugging since we don't know on which 
> table/region/row the request is against.
> What's more, we could see some if-else check in the {{RpcServer#logResponse}} 
> method which trying to do sth different when the {{param}} includes instance 
> of {{Operation}}, but there's only one place invoking {{logResponse}} and the 
> {{param}} is always an instance of {{Message}}. Checking the change history, 
> I believe this is a left-over cleanup in work of HBASE-8214 
> We will address the above issues, do some cleanup and improve the log just 
> like {{RpcServer$Call#toString}} does to include table/region/row information 
> of the request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16972) Log more details for Scan#next request when responseTooSlow

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15641985#comment-15641985
 ] 

Hudson commented on HBASE-16972:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1897 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1897/])
HBASE-16972 Log more details for Scan#next request when responseTooSlow (liyu: 
rev 94c52d32dfa27dfda277e4526bbfc6d8e47d4abd)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java


> Log more details for Scan#next request when responseTooSlow
> ---
>
> Key: HBASE-16972
> URL: https://issues.apache.org/jira/browse/HBASE-16972
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, 
> HBASE-16972.v3.patch
>
>
> Currently for if responseTooSlow happens on the scan.next call, we will get 
> warn log like below:
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"}
> {noformat}
> From which we only have a {{scanner_id}} and impossible to know what exactly 
> this scan is about, like against which region of which table.
> After this JIRA, we will improve the message to something like below (notice 
> the last line):
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster",
> "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16033) Add more details in logging of responseTooSlow/TooLarge

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15641993#comment-15641993
 ] 

Hudson commented on HBASE-16033:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #58 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/58/])
HBASE-16033 Add more details in logging of responseTooSlow/TooLarge (liyu: rev 
b206809d330bdbb048c472c9809663d074052c3e)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Add more details in logging of responseTooSlow/TooLarge
> ---
>
> Key: HBASE-16033
> URL: https://issues.apache.org/jira/browse/HBASE-16033
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21, 1.2.4, 1.1.8
>
> Attachments: HBASE-16033.patch, HBASE-16033.patch, HBASE-16033.patch
>
>
> Currently the log message when responseTooSlow/TooLarge is like:
> {noformat}
> 2016-06-08 12:18:04,363 WARN  
> [B.defaultRpcServer.handler=127,queue=10,port=16020]
> ipc.RpcServer: (responseTooSlow): 
> {"processingtimems":13125,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)",
> "client":"11.251.158.22:36331","starttimems":1465359471238,"queuetimems":1540116,
> "class":"HRegionServer","responsesize":17,"method":"Multi"}
> {noformat}
> which is kind of helpless for debugging since we don't know on which 
> table/region/row the request is against.
> What's more, we could see some if-else check in the {{RpcServer#logResponse}} 
> method which trying to do sth different when the {{param}} includes instance 
> of {{Operation}}, but there's only one place invoking {{logResponse}} and the 
> {{param}} is always an instance of {{Message}}. Checking the change history, 
> I believe this is a left-over cleanup in work of HBASE-8214 
> We will address the above issues, do some cleanup and improve the log just 
> like {{RpcServer$Call#toString}} does to include table/region/row information 
> of the request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16972) Log more details for Scan#next request when responseTooSlow

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15641994#comment-15641994
 ] 

Hudson commented on HBASE-16972:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #58 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/58/])
HBASE-16972 Log more details for Scan#next request when responseTooSlow (liyu: 
rev aeef1661c731bc4139509ecf926fcd8207b33049)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Log more details for Scan#next request when responseTooSlow
> ---
>
> Key: HBASE-16972
> URL: https://issues.apache.org/jira/browse/HBASE-16972
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, 
> HBASE-16972.v3.patch
>
>
> Currently for if responseTooSlow happens on the scan.next call, we will get 
> warn log like below:
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"}
> {noformat}
> From which we only have a {{scanner_id}} and impossible to know what exactly 
> this scan is about, like against which region of which table.
> After this JIRA, we will improve the message to something like below (notice 
> the last line):
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster",
> "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16972) Log more details for Scan#next request when responseTooSlow

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642062#comment-15642062
 ] 

Hudson commented on HBASE-16972:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1813 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1813/])
HBASE-16972 Log more details for Scan#next request when responseTooSlow (liyu: 
rev 94c52d32dfa27dfda277e4526bbfc6d8e47d4abd)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java


> Log more details for Scan#next request when responseTooSlow
> ---
>
> Key: HBASE-16972
> URL: https://issues.apache.org/jira/browse/HBASE-16972
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, 
> HBASE-16972.v3.patch
>
>
> Currently for if responseTooSlow happens on the scan.next call, we will get 
> warn log like below:
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"}
> {noformat}
> From which we only have a {{scanner_id}} and impossible to know what exactly 
> this scan is about, like against which region of which table.
> After this JIRA, we will improve the message to something like below (notice 
> the last line):
> {noformat}
> 2016-10-31 11:43:23,430 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] 
> ipc.RpcServer(2574):
> (responseTooSlow): 
> {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)",
> "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id:
>  11 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 0 client_handles_partials: true 
> client_handles_heartbeats: true
> track_scan_metrics: false renew: 
> false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster",
> "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16033) Add more details in logging of responseTooSlow/TooLarge

2016-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642061#comment-15642061
 ] 

Hudson commented on HBASE-16033:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1813 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1813/])
HBASE-16033 Add more details in logging of responseTooSlow/TooLarge (liyu: rev 
14f5f1c1912e200926d61e9a9c9db246a1380fb7)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Add more details in logging of responseTooSlow/TooLarge
> ---
>
> Key: HBASE-16033
> URL: https://issues.apache.org/jira/browse/HBASE-16033
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 1.2.3, 1.1.7
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21, 1.2.4, 1.1.8
>
> Attachments: HBASE-16033.patch, HBASE-16033.patch, HBASE-16033.patch
>
>
> Currently the log message when responseTooSlow/TooLarge is like:
> {noformat}
> 2016-06-08 12:18:04,363 WARN  
> [B.defaultRpcServer.handler=127,queue=10,port=16020]
> ipc.RpcServer: (responseTooSlow): 
> {"processingtimems":13125,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)",
> "client":"11.251.158.22:36331","starttimems":1465359471238,"queuetimems":1540116,
> "class":"HRegionServer","responsesize":17,"method":"Multi"}
> {noformat}
> which is kind of helpless for debugging since we don't know on which 
> table/region/row the request is against.
> What's more, we could see some if-else check in the {{RpcServer#logResponse}} 
> method which trying to do sth different when the {{param}} includes instance 
> of {{Operation}}, but there's only one place invoking {{logResponse}} and the 
> {{param}} is always an instance of {{Message}}. Checking the change history, 
> I believe this is a left-over cleanup in work of HBASE-8214 
> We will address the above issues, do some cleanup and improve the log just 
> like {{RpcServer$Call#toString}} does to include table/region/row information 
> of the request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)
ChiaPing Tsai created HBASE-17036:
-

 Summary: Unify HTable#checkAndPut with AP
 Key: HBASE-17036
 URL: https://issues.apache.org/jira/browse/HBASE-17036
 Project: HBase
  Issue Type: Sub-task
Reporter: ChiaPing Tsai
Priority: Minor


It is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Description: The implementation is similar to HTable#checkAndDelete.  (was: 
It is similar to HTable#checkAndDelete.)

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Reporter: ChiaPing Tsai
>Priority: Minor
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Attachment: HBASE-17036.v0.patch

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Reporter: ChiaPing Tsai
>Priority: Minor
> Attachments: HBASE-17036.v0.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Status: Patch Available  (was: Open)

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Reporter: ChiaPing Tsai
>Priority: Minor
> Attachments: HBASE-17036.v0.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai reassigned HBASE-17036:
-

Assignee: ChiaPing Tsai

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17036.v0.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Fix Version/s: 2.0.0

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17036.v0.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Affects Version/s: 2.0.0

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17036.v0.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642242#comment-15642242
 ] 

Hadoop QA commented on HBASE-17036:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
0s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 4s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
7s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 26s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837656/HBASE-17036.v0.patch |
| JIRA Issue | HBASE-17036 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 906c63ec3570 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7e05d0f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4349/testReport/ |
| modules | C: hbase-client U: hbase-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4349/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>  

[jira] [Created] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Ted Yu (JIRA)
Ted Yu created HBASE-17037:
--

 Summary: Enhance LoadIncrementalHFiles API to convey loaded files
 Key: HBASE-17037
 URL: https://issues.apache.org/jira/browse/HBASE-17037
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu


When Map> is passed to LoadIncrementalHFiles, we should 
provide a means for the caller to get the collection of Paths for the loaded 
hfiles.

The functionality added by HBASE-16821 is preserved as shown by the modified 
TestLoadIncrementalHFiles#testSimpleLoadWithMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17034) avoid unnecessary Get copy in HTable#exist

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17034:
--
Summary: avoid unnecessary Get copy in HTable#exist  (was: HTable#exist and 
HTable#existAll are flaky)

> avoid unnecessary Get copy in HTable#exist
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Bug
>Reporter: ChiaPing Tsai
>Priority: Minor
>
> # HTable#exist apply the default consistency, but HTable#existAll dosen’t
> # HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
> (HTable#get(Get, boolean) clones the passed Get)
> So we have some issues outlined below.
> # Can the passed Get be modified? If so, we can save the clone of Get. If 
> not, the HTable#getScanner() modify the passed Scan for some unset variables, 
> so it is ok to modify the passed Get I guess. 
> # Should we assign the default Consistency to the passed Get which has an 
> null value of Consistency?
> This jira may be solved by 
> [HBASE-16623|https://issues.apache.org/jira/browse/HBASE-16623].
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17034) avoid unnecessary Get copy in HTable#exist

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17034:
--
Description: 
{code:title=HTable.java|borderStyle=solid}
private Result get(Get get, final boolean checkExistenceOnly) throws 
IOException {
if (get.isCheckExistenceOnly() != checkExistenceOnly || 
get.getConsistency() == null) {
  get = ReflectionUtils.newInstance(get.getClass(), get);
  get.setCheckExistenceOnly(checkExistenceOnly);
  if (get.getConsistency() == null){
get.setConsistency(defaultConsistency);
  }
}
  ...
}
{code}
Can the passed Get be modified? If so, we can just change the passed Get. If 
not, we can record the values returned by isCheckExistenceOnly() and 
getConsistency() for avoiding the Get copy.

It seems to me that it is ok to modify the passed Get.

Any comment? Thanks.

  was:
# HTable#exist apply the default consistency, but HTable#existAll dosen’t
# HTable#existAll may change the passed Gets , but HTable#exist dosen’t. 
(HTable#get(Get, boolean) clones the passed Get)

So we have some issues outlined below.
# Can the passed Get be modified? If so, we can save the clone of Get. If not, 
the HTable#getScanner() modify the passed Scan for some unset variables, so it 
is ok to modify the passed Get I guess. 
# Should we assign the default Consistency to the passed Get which has an null 
value of Consistency?

This jira may be solved by 
[HBASE-16623|https://issues.apache.org/jira/browse/HBASE-16623].

Any comment? Thanks.


> avoid unnecessary Get copy in HTable#exist
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Bug
>Reporter: ChiaPing Tsai
>Priority: Minor
>
> {code:title=HTable.java|borderStyle=solid}
> private Result get(Get get, final boolean checkExistenceOnly) throws 
> IOException {
> if (get.isCheckExistenceOnly() != checkExistenceOnly || 
> get.getConsistency() == null) {
>   get = ReflectionUtils.newInstance(get.getClass(), get);
>   get.setCheckExistenceOnly(checkExistenceOnly);
>   if (get.getConsistency() == null){
> get.setConsistency(defaultConsistency);
>   }
> }
>   ...
> }
> {code}
> Can the passed Get be modified? If so, we can just change the passed Get. If 
> not, we can record the values returned by isCheckExistenceOnly() and 
> getConsistency() for avoiding the Get copy.
> It seems to me that it is ok to modify the passed Get.
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17034) avoid unnecessary Get copy in HTable#exist

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17034:
--
 Assignee: ChiaPing Tsai
Fix Version/s: 2.0.0
Affects Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> avoid unnecessary Get copy in HTable#exist
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17034.v0.patch
>
>
> {code:title=HTable.java|borderStyle=solid}
> private Result get(Get get, final boolean checkExistenceOnly) throws 
> IOException {
> if (get.isCheckExistenceOnly() != checkExistenceOnly || 
> get.getConsistency() == null) {
>   get = ReflectionUtils.newInstance(get.getClass(), get);
>   get.setCheckExistenceOnly(checkExistenceOnly);
>   if (get.getConsistency() == null){
> get.setConsistency(defaultConsistency);
>   }
> }
>   ...
> }
> {code}
> Can the passed Get be modified? If so, we can just change the passed Get. If 
> not, we can record the values returned by isCheckExistenceOnly() and 
> getConsistency() for avoiding the Get copy.
> It seems to me that it is ok to modify the passed Get.
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17034) avoid unnecessary Get copy in HTable#exist

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17034:
--
Issue Type: Improvement  (was: Bug)

> avoid unnecessary Get copy in HTable#exist
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17034.v0.patch
>
>
> {code:title=HTable.java|borderStyle=solid}
> private Result get(Get get, final boolean checkExistenceOnly) throws 
> IOException {
> if (get.isCheckExistenceOnly() != checkExistenceOnly || 
> get.getConsistency() == null) {
>   get = ReflectionUtils.newInstance(get.getClass(), get);
>   get.setCheckExistenceOnly(checkExistenceOnly);
>   if (get.getConsistency() == null){
> get.setConsistency(defaultConsistency);
>   }
> }
>   ...
> }
> {code}
> Can the passed Get be modified? If so, we can just change the passed Get. If 
> not, we can record the values returned by isCheckExistenceOnly() and 
> getConsistency() for avoiding the Get copy.
> It seems to me that it is ok to modify the passed Get.
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17034) avoid unnecessary Get copy in HTable#exist

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17034:
--
Attachment: HBASE-17034.v0.patch

> avoid unnecessary Get copy in HTable#exist
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17034.v0.patch
>
>
> {code:title=HTable.java|borderStyle=solid}
> private Result get(Get get, final boolean checkExistenceOnly) throws 
> IOException {
> if (get.isCheckExistenceOnly() != checkExistenceOnly || 
> get.getConsistency() == null) {
>   get = ReflectionUtils.newInstance(get.getClass(), get);
>   get.setCheckExistenceOnly(checkExistenceOnly);
>   if (get.getConsistency() == null){
> get.setConsistency(defaultConsistency);
>   }
> }
>   ...
> }
> {code}
> Can the passed Get be modified? If so, we can just change the passed Get. If 
> not, we can record the values returned by isCheckExistenceOnly() and 
> getConsistency() for avoiding the Get copy.
> It seems to me that it is ok to modify the passed Get.
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17034) avoid unnecessary Get copy in HTable#exist

2016-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642770#comment-15642770
 ] 

Hadoop QA commented on HBASE-17034:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
59s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
8s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 2s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
7s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 22s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837672/HBASE-17034.v0.patch |
| JIRA Issue | HBASE-17034 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 146796588ef2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7e05d0f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4350/testReport/ |
| modules | C: hbase-client U: hbase-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4350/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> avoid unnecessary Get copy in HTable#exist
> --
>
> Key: HBASE-17034
> URL: https://issues.apache.org/jira/browse/HBASE-17034
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: C

[jira] [Updated] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17037:
---
Attachment: 17037.v2.txt

> Enhance LoadIncrementalHFiles API to convey loaded files
> 
>
> Key: HBASE-17037
> URL: https://issues.apache.org/jira/browse/HBASE-17037
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
> Attachments: 17037.v2.txt
>
>
> When Map> is passed to LoadIncrementalHFiles, we should 
> provide a means for the caller to get the collection of Paths for the loaded 
> hfiles.
> The functionality added by HBASE-16821 is preserved as shown by the modified 
> TestLoadIncrementalHFiles#testSimpleLoadWithMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17037:
---
Attachment: (was: 17037.v2.txt)

> Enhance LoadIncrementalHFiles API to convey loaded files
> 
>
> Key: HBASE-17037
> URL: https://issues.apache.org/jira/browse/HBASE-17037
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> When Map> is passed to LoadIncrementalHFiles, we should 
> provide a means for the caller to get the collection of Paths for the loaded 
> hfiles.
> The functionality added by HBASE-16821 is preserved as shown by the modified 
> TestLoadIncrementalHFiles#testSimpleLoadWithMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17037:
---
Attachment: 17037.v2.txt

> Enhance LoadIncrementalHFiles API to convey loaded files
> 
>
> Key: HBASE-17037
> URL: https://issues.apache.org/jira/browse/HBASE-17037
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
> Attachments: 17037.v2.txt
>
>
> When Map> is passed to LoadIncrementalHFiles, we should 
> provide a means for the caller to get the collection of Paths for the loaded 
> hfiles.
> The functionality added by HBASE-16821 is preserved as shown by the modified 
> TestLoadIncrementalHFiles#testSimpleLoadWithMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17037:
---
Status: Patch Available  (was: Open)

> Enhance LoadIncrementalHFiles API to convey loaded files
> 
>
> Key: HBASE-17037
> URL: https://issues.apache.org/jira/browse/HBASE-17037
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
> Attachments: 17037.v2.txt
>
>
> When Map> is passed to LoadIncrementalHFiles, we should 
> provide a means for the caller to get the collection of Paths for the loaded 
> hfiles.
> The functionality added by HBASE-16821 is preserved as shown by the modified 
> TestLoadIncrementalHFiles#testSimpleLoadWithMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642836#comment-15642836
 ] 

Hadoop QA commented on HBASE-17037:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
0s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 32s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 32s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 32s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 12s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 2m 23s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 35s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 48s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.6.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 59s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 10s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.7.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 20s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.7.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 32s 
{color} | {color:red} The patch causes 19 errors with Hadoop v2.7.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 42s 
{color} | {color:red} The patch causes 19 errors with Hadoop v3.0.0-alpha1. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 32s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
7s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 14s {color} 
| {color:black} {color} |
\\
\\
|| Subsy

[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Status: Open  (was: Patch Available)

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17036.v0.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Status: Patch Available  (was: Open)

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17036.v0.patch, HBASE-17036.v1.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-17036:
--
Attachment: HBASE-17036.v1.patch

Run QA for hbase-server module

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17036.v0.patch, HBASE-17036.v1.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17037:
---
Attachment: 17037.v3.txt

> Enhance LoadIncrementalHFiles API to convey loaded files
> 
>
> Key: HBASE-17037
> URL: https://issues.apache.org/jira/browse/HBASE-17037
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
> Attachments: 17037.v2.txt, 17037.v3.txt
>
>
> When Map> is passed to LoadIncrementalHFiles, we should 
> provide a means for the caller to get the collection of Paths for the loaded 
> hfiles.
> The functionality added by HBASE-16821 is preserved as shown by the modified 
> TestLoadIncrementalHFiles#testSimpleLoadWithMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16666) Add append and remove peer namespaces cmds for replication

2016-11-06 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-1:
---
Resolution: Resolved
Status: Resolved  (was: Patch Available)

> Add append and remove peer namespaces cmds for replication
> --
>
> Key: HBASE-1
> URL: https://issues.apache.org/jira/browse/HBASE-1
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-1-v1.patch, HBASE-1-v2.patch, 
> HBASE-1.patch
>
>
> After HBASE-16447, we support replication by namespaces config in peer. Like 
> append_peer_tableCFs and remove_peer_tableCFs, I thought we need two new 
> shell cmd:  append_peer_namespaces and remove_peer_namespaces. Then we can 
> easily change the namespaces config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16947) Some improvements for DumpReplicationQueues tool

2016-11-06 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-16947:
---
Resolution: Resolved
Status: Resolved  (was: Patch Available)

> Some improvements for DumpReplicationQueues tool
> 
>
> Key: HBASE-16947
> URL: https://issues.apache.org/jira/browse/HBASE-16947
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability, Replication
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: HBASE-16947-branch-1.patch, HBASE-16947-branch-1.patch, 
> HBASE-16947-branch-1.patch, HBASE-16947-v1.patch, HBASE-16947.patch
>
>
> Recently we met too many replication WALs problem in our production cluster. 
> We need the DumpReplicationQueues tool to analyze the replication queues info 
> in zookeeper. So I backport HBASE-16450 to our branch based 0.98 and did some 
> improvements for it.
> 1. Show the dead regionservers under replication/rs znode. When there are too 
> many WALs under znode, it can't be atomic transferred to new rs znode. So the 
> dead rs znode will be leaved on zookeeper.
> 2. Make a summary about all the queues that belong to peer has been deleted. 
> 3. Aggregate all regionservers' size of replication queue. Now the 
> regionserver report ReplicationLoad to master, but there were not a aggregate 
> metrics for replication.
> 4. Show how many WALs which can not found on hdfs. But the reason (WAL Not 
> Found) need more time to dig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17026) VerifyReplication log should distinguish whether good row key is result of revalidation

2016-11-06 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642957#comment-15642957
 ] 

Jianwei Cui commented on HBASE-17026:
-

The patch looks good to me [~tedyu]. BTW, because the rowkey may be an 
unreadable binary array, do we need to use 'Bytes.toStringBinary(...)' to print 
the rowkey?

> VerifyReplication log should distinguish whether good row key is result of 
> revalidation
> ---
>
> Key: HBASE-17026
> URL: https://issues.apache.org/jira/browse/HBASE-17026
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 17026.v1.txt
>
>
> Inspecting app log from VerifyReplication, I saw lines in the following form:
> {code}
> 2016-11-03 15:28:44,877 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: Good row 
> key: X000  X
> {code}
> where 'X' is the delimiter.
> Without line number, it is difficult to tell whether the good row has gone 
> through revalidation.
> This issue is to distinguish the two logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17010) Serial replication should handle daughter regions being assigned to another RS

2016-11-06 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642974#comment-15642974
 ] 

Phil Yang commented on HBASE-17010:
---

{quote}
There is no limit for the duration of the loop ?
{quote}
We should wait forever until parent region's log has been pushed, or the order 
will be broken.

> Serial replication should handle daughter regions being assigned to another RS
> --
>
> Key: HBASE-17010
> URL: https://issues.apache.org/jira/browse/HBASE-17010
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Phil Yang
> Attachments: HBASE-17010-v1.patch
>
>
> testRegionSplit and testRegionMerge were temporarily disabled by HBASE-16975.
> HBASE-9465 has an assumption that when we split a region, two daughter 
> regions are in same RS with the parent region. But after HBASE-14551 went in, 
> daughter regions may be assigned to other RS directly.  
> This issue is to handle the new behavior and reenable the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17010) Serial replication should handle daughter regions being assigned to another RS

2016-11-06 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HBASE-17010:
--
Attachment: HBASE-17010-v2.patch

Fix according to reviews

> Serial replication should handle daughter regions being assigned to another RS
> --
>
> Key: HBASE-17010
> URL: https://issues.apache.org/jira/browse/HBASE-17010
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Phil Yang
> Attachments: HBASE-17010-v1.patch, HBASE-17010-v2.patch
>
>
> testRegionSplit and testRegionMerge were temporarily disabled by HBASE-16975.
> HBASE-9465 has an assumption that when we split a region, two daughter 
> regions are in same RS with the parent region. But after HBASE-14551 went in, 
> daughter regions may be assigned to other RS directly.  
> This issue is to handle the new behavior and reenable the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17026) VerifyReplication log should distinguish whether good row key is result of revalidation

2016-11-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17026:
---
Attachment: 17026.v2.txt

> VerifyReplication log should distinguish whether good row key is result of 
> revalidation
> ---
>
> Key: HBASE-17026
> URL: https://issues.apache.org/jira/browse/HBASE-17026
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 17026.v1.txt, 17026.v2.txt
>
>
> Inspecting app log from VerifyReplication, I saw lines in the following form:
> {code}
> 2016-11-03 15:28:44,877 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: Good row 
> key: X000  X
> {code}
> where 'X' is the delimiter.
> Without line number, it is difficult to tell whether the good row has gone 
> through revalidation.
> This issue is to distinguish the two logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17038) Allow verifyrep to persist row keys

2016-11-06 Thread Ted Yu (JIRA)
Ted Yu created HBASE-17038:
--

 Summary: Allow verifyrep to persist row keys
 Key: HBASE-17038
 URL: https://issues.apache.org/jira/browse/HBASE-17038
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu


I have been involved in a case where user of verifyrep observed fluctuating row 
key count during successive runs with time range.

If verifyrep can persist the row keys during current run (to e.g. hdfs) 
(controlled by a flag), successive run(s) would be able to use this information 
to find the difference in row keys w.r.t. previous run and perform some actions 
(such as performing raw scan, etc).

This would allow the user to correlate results from successive runs better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)
Charlie Qiangeng Xu created HBASE-17039:
---

 Summary: SimpleLoadBalancer schedules large amount of invalid 
region moves
 Key: HBASE-17039
 URL: https://issues.apache.org/jira/browse/HBASE-17039
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 1.2.3, 1.1.6, 2.0.0
Reporter: Charlie Qiangeng Xu
Assignee: Charlie Qiangeng Xu
 Fix For: 2.0.0, 1.2.3, 1.1.6


After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 3 thousand moves) fired by balance chore. 
Thus we simulated the problem and printed out the balance plan, only to find 
out many server that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero regions. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause such problem mentioned above.
Since we increase the cluster's size to 1600+, many table only have 1000 
regions, now would encounter such issue.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17039:

Description: 
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by balance chore. Thus we 
simulated the problem and printed out the balance plan, only to find out many 
server that had two regions for a certain table(we use by table strategy), sent 
out both regions to other two servers that have zero regions. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause such problem mentioned above.
Since we increase the cluster's size to 1600+, many table only have 1000 
regions, now would encounter such issue.



  was:
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 3 thousand moves) fired by balance chore. 
Thus we simulated the problem and printed out the balance plan, only to find 
out many server that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero regions. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause such problem mentioned above.
Since we increase the cluster's size to 1600+, many table only have 1000 
regions, now would encounter such issue.




> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.1.6, 1.2.3
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Fix For: 2.0.0, 1.1.6, 1.2.3
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by balance chore. 
> Thus we simulated the problem and printed out the balance plan, only to find 
> out many server that had two regions for a certain table(we use by table 
> strategy), sent out both regions to other two servers that have zero regions. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause such problem mentioned above.
> Since we increase the cluster's size to 1600+, many table only have 1000 
> regions, now would encounter such issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17039:

Description: 
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by the balance chore. Thus 
we simulated the problem and printed out the balance plan, only to find out 
many servers that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero region. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause the phenomenon mentioned above.
Since we increased the cluster's size to 1600+, many tables that only have 1000 
regions, now would encounter such issue.



  was:
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by balance chore. Thus we 
simulated the problem and printed out the balance plan, only to find out many 
server that had two regions for a certain table(we use by table strategy), sent 
out both regions to other two servers that have zero regions. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause such problem mentioned above.
Since we increase the cluster's size to 1600+, many table only have 1000 
regions, now would encounter such issue.




> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.1.6, 1.2.3
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Fix For: 2.0.0, 1.1.6, 1.2.3
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17039:

Description: 
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by the balance chore. Thus 
we simulated the problem and printed out the balance plan, only to find out 
many servers that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero region. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause the phenomenon mentioned above.
Since we increased the cluster's size to 1600+, many tables that only have 1000 
regions, now would encounter such issue.
By fixing it up, the balance plan went back to normal.


  was:
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by the balance chore. Thus 
we simulated the problem and printed out the balance plan, only to find out 
many servers that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero region. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause the phenomenon mentioned above.
Since we increased the cluster's size to 1600+, many tables that only have 1000 
regions, now would encounter such issue.




> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.1.6, 1.2.3
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Fix For: 2.0.0, 1.1.6, 1.2.3
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17010) Serial replication should handle daughter regions being assigned to another RS

2016-11-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643065#comment-15643065
 ] 

Ted Yu commented on HBASE-17010:


For the loop in ReplicationSourceManager, is it possible to display some 
warning on UI after certain period has passed ?

What do you think ?

> Serial replication should handle daughter regions being assigned to another RS
> --
>
> Key: HBASE-17010
> URL: https://issues.apache.org/jira/browse/HBASE-17010
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Phil Yang
> Attachments: HBASE-17010-v1.patch, HBASE-17010-v2.patch
>
>
> testRegionSplit and testRegionMerge were temporarily disabled by HBASE-16975.
> HBASE-9465 has an assumption that when we split a region, two daughter 
> regions are in same RS with the parent region. But after HBASE-14551 went in, 
> daughter regions may be assigned to other RS directly.  
> This issue is to handle the new behavior and reenable the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643068#comment-15643068
 ] 

Hadoop QA commented on HBASE-17036:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
59s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
25s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 5s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 31s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 120m 5s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.snapshot.TestMobSecureExportSnapshot |
| Timed out junit tests | org.apache.hadoop.hbase.constraint.TestConstraint |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowAndColumnRangeFilter |
|   | org.apache.hadoop.hbase.TestNamespace |
|   | 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837682/HBASE-17036.v1.patch |
| JIRA Issue | HBASE-17036 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 79af5bb0b270 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/compo

[jira] [Commented] (HBASE-17010) Serial replication should handle daughter regions being assigned to another RS

2016-11-06 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643072#comment-15643072
 ] 

Phil Yang commented on HBASE-17010:
---

Sounds good. I think we can add a warning on UI if there are any replication 
source being blocking too long time or too many logs? Not only for serial 
replication.

> Serial replication should handle daughter regions being assigned to another RS
> --
>
> Key: HBASE-17010
> URL: https://issues.apache.org/jira/browse/HBASE-17010
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Phil Yang
> Attachments: HBASE-17010-v1.patch, HBASE-17010-v2.patch
>
>
> testRegionSplit and testRegionMerge were temporarily disabled by HBASE-16975.
> HBASE-9465 has an assumption that when we split a region, two daughter 
> regions are in same RS with the parent region. But after HBASE-14551 went in, 
> daughter regions may be assigned to other RS directly.  
> This issue is to handle the new behavior and reenable the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643086#comment-15643086
 ] 

Hadoop QA commented on HBASE-17037:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
30s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 22s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 19s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 121m 16s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.client.TestFromClientSide |
|   | org.apache.hadoop.hbase.client.TestAsyncGetMultiThread |
|   | org.apache.hadoop.hbase.client.TestHCM |
|   | org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837683/17037.v3.txt |
| JIRA Issue | HBASE-17037 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux bcf23c1b3871 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7e05d0f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4353/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/4353/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Re

[jira] [Commented] (HBASE-16993) BucketCache throw java.io.IOException: Invalid HFile block magic when DATA_BLOCK_ENCODING set to DIFF

2016-11-06 Thread liubangchen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643092#comment-15643092
 ] 

liubangchen commented on HBASE-16993:
-

Must be a multiple of 1024 else you will run into
'java.io.IOException: Invalid HFile block magic' when you go to read from 
cache.
If you specify no values here, then you pick up the default bucketsizes set
in code (See BucketAllocator#DEFAULT_BUCKET_SIZES).

> BucketCache throw java.io.IOException: Invalid HFile block magic when 
> DATA_BLOCK_ENCODING set to DIFF
> -
>
> Key: HBASE-16993
> URL: https://issues.apache.org/jira/browse/HBASE-16993
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 1.1.3
> Environment: hbase version 1.1.3
>Reporter: liubangchen
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> hbase-site.xml setting
> 
> hbase.bucketcache.bucket.sizes
> 16384,32768,40960, 
> 46000,49152,51200,65536,131072,524288
> 
> 
> hbase.bucketcache.size
> 16384
> 
> 
> hbase.bucketcache.ioengine
> offheap
> 
> 
> hfile.block.cache.size
> 0.3
> 
> 
> hfile.block.bloom.cacheonwrite
> true
> 
> 
> hbase.rs.cacheblocksonwrite
> true
> 
> 
> hfile.block.index.cacheonwrite
> true
>  n_splits = 200
> create 'usertable',{NAME =>'family', COMPRESSION => 'snappy', VERSIONS => 
> 1,DATA_BLOCK_ENCODING => 'DIFF',CONFIGURATION => 
> {'hbase.hregion.memstore.block.multiplier' => 5}},{DURABILITY => 
> 'SKIP_WAL'},{SPLITS => (1..n_splits).map {|i| 
> "user#{1000+i*(-1000)/n_splits}"}}
> load data
> bin/ycsb load hbase10 -P workloads/workloada -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> recordcount=2 -p insertorder=hashed -p insertstart=0 -p 
> clientbuffering=true -p durability=SKIP_WAL -threads 20 -s 
> run 
> bin/ycsb run hbase10 -P workloads/workloadb -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> operationcount=2000 -p readallfields=true -p clientbuffering=true -p 
> requestdistribution=zipfian  -threads 10 -s
> log info
> 2016-11-02 20:20:20,261 ERROR 
> [RW.default.readRpcServer.handler=36,queue=21,port=6020] bucket.BucketCache: 
> Failed reading block fdcc7ed6f3b2498b9ef316cc8206c233_44819759 from bucket 
> cache
> java.io.IOException: Invalid HFile block magic: 
> \x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)
> at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:167)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.(HFileBlock.java:273)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:134)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:121)
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:427)
> at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:266)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:403)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2071)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5369)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2532)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2514)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6558)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6537)
> at 
> org.apache.hadoop.hbase

[jira] [Issue Comment Deleted] (HBASE-16993) BucketCache throw java.io.IOException: Invalid HFile block magic when DATA_BLOCK_ENCODING set to DIFF

2016-11-06 Thread liubangchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liubangchen updated HBASE-16993:

Comment: was deleted

(was: Must be a multiple of 1024 else you will run into
'java.io.IOException: Invalid HFile block magic' when you go to read from 
cache.
If you specify no values here, then you pick up the default bucketsizes set
in code (See BucketAllocator#DEFAULT_BUCKET_SIZES).)

> BucketCache throw java.io.IOException: Invalid HFile block magic when 
> DATA_BLOCK_ENCODING set to DIFF
> -
>
> Key: HBASE-16993
> URL: https://issues.apache.org/jira/browse/HBASE-16993
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 1.1.3
> Environment: hbase version 1.1.3
>Reporter: liubangchen
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> hbase-site.xml setting
> 
> hbase.bucketcache.bucket.sizes
> 16384,32768,40960, 
> 46000,49152,51200,65536,131072,524288
> 
> 
> hbase.bucketcache.size
> 16384
> 
> 
> hbase.bucketcache.ioengine
> offheap
> 
> 
> hfile.block.cache.size
> 0.3
> 
> 
> hfile.block.bloom.cacheonwrite
> true
> 
> 
> hbase.rs.cacheblocksonwrite
> true
> 
> 
> hfile.block.index.cacheonwrite
> true
>  n_splits = 200
> create 'usertable',{NAME =>'family', COMPRESSION => 'snappy', VERSIONS => 
> 1,DATA_BLOCK_ENCODING => 'DIFF',CONFIGURATION => 
> {'hbase.hregion.memstore.block.multiplier' => 5}},{DURABILITY => 
> 'SKIP_WAL'},{SPLITS => (1..n_splits).map {|i| 
> "user#{1000+i*(-1000)/n_splits}"}}
> load data
> bin/ycsb load hbase10 -P workloads/workloada -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> recordcount=2 -p insertorder=hashed -p insertstart=0 -p 
> clientbuffering=true -p durability=SKIP_WAL -threads 20 -s 
> run 
> bin/ycsb run hbase10 -P workloads/workloadb -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> operationcount=2000 -p readallfields=true -p clientbuffering=true -p 
> requestdistribution=zipfian  -threads 10 -s
> log info
> 2016-11-02 20:20:20,261 ERROR 
> [RW.default.readRpcServer.handler=36,queue=21,port=6020] bucket.BucketCache: 
> Failed reading block fdcc7ed6f3b2498b9ef316cc8206c233_44819759 from bucket 
> cache
> java.io.IOException: Invalid HFile block magic: 
> \x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)
> at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:167)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.(HFileBlock.java:273)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:134)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:121)
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:427)
> at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:266)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:403)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2071)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5369)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2532)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2514)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6558)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6537)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServic

[jira] [Resolved] (HBASE-16993) BucketCache throw java.io.IOException: Invalid HFile block magic when DATA_BLOCK_ENCODING set to DIFF

2016-11-06 Thread liubangchen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liubangchen resolved HBASE-16993.
-
Resolution: Not A Bug

Must be a multiple of 1024 else you will run into
'java.io.IOException: Invalid HFile block magic' when you go to read from 
cache.
If you specify no values here, then you pick up the default bucketsizes set
in code (See BucketAllocator#DEFAULT_BUCKET_SIZES).

> BucketCache throw java.io.IOException: Invalid HFile block magic when 
> DATA_BLOCK_ENCODING set to DIFF
> -
>
> Key: HBASE-16993
> URL: https://issues.apache.org/jira/browse/HBASE-16993
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 1.1.3
> Environment: hbase version 1.1.3
>Reporter: liubangchen
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> hbase-site.xml setting
> 
> hbase.bucketcache.bucket.sizes
> 16384,32768,40960, 
> 46000,49152,51200,65536,131072,524288
> 
> 
> hbase.bucketcache.size
> 16384
> 
> 
> hbase.bucketcache.ioengine
> offheap
> 
> 
> hfile.block.cache.size
> 0.3
> 
> 
> hfile.block.bloom.cacheonwrite
> true
> 
> 
> hbase.rs.cacheblocksonwrite
> true
> 
> 
> hfile.block.index.cacheonwrite
> true
>  n_splits = 200
> create 'usertable',{NAME =>'family', COMPRESSION => 'snappy', VERSIONS => 
> 1,DATA_BLOCK_ENCODING => 'DIFF',CONFIGURATION => 
> {'hbase.hregion.memstore.block.multiplier' => 5}},{DURABILITY => 
> 'SKIP_WAL'},{SPLITS => (1..n_splits).map {|i| 
> "user#{1000+i*(-1000)/n_splits}"}}
> load data
> bin/ycsb load hbase10 -P workloads/workloada -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> recordcount=2 -p insertorder=hashed -p insertstart=0 -p 
> clientbuffering=true -p durability=SKIP_WAL -threads 20 -s 
> run 
> bin/ycsb run hbase10 -P workloads/workloadb -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> operationcount=2000 -p readallfields=true -p clientbuffering=true -p 
> requestdistribution=zipfian  -threads 10 -s
> log info
> 2016-11-02 20:20:20,261 ERROR 
> [RW.default.readRpcServer.handler=36,queue=21,port=6020] bucket.BucketCache: 
> Failed reading block fdcc7ed6f3b2498b9ef316cc8206c233_44819759 from bucket 
> cache
> java.io.IOException: Invalid HFile block magic: 
> \x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)
> at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:167)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.(HFileBlock.java:273)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:134)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:121)
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:427)
> at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:266)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:403)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2071)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5369)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2532)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2514)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6558)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6537)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.g

[jira] [Commented] (HBASE-17036) Unify HTable#checkAndPut with AP

2016-11-06 Thread ChiaPing Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643104#comment-15643104
 ] 

ChiaPing Tsai commented on HBASE-17036:
---

All failed tests pass locally.

> Unify HTable#checkAndPut with AP
> 
>
> Key: HBASE-17036
> URL: https://issues.apache.org/jira/browse/HBASE-17036
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17036.v0.patch, HBASE-17036.v1.patch
>
>
> The implementation is similar to HTable#checkAndDelete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-17039:
--
Description: 
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by the balance chore. Thus 
we simulated the problem and printed out the balance plan, only to find out 
many servers that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero region. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
{code}
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
{code}
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause the phenomenon mentioned above.
Since we increased the cluster's size to 1600+, many tables that only have 1000 
regions, now would encounter such issue.
By fixing it up, the balance plan went back to normal.


  was:
After increasing one of our clusters to 1600 nodes, we observed a large amount 
of invalid region moves(more than 30k moves) fired by the balance chore. Thus 
we simulated the problem and printed out the balance plan, only to find out 
many servers that had two regions for a certain table(we use by table 
strategy), sent out both regions to other two servers that have zero region. 
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:
  if (load >= min && load > 0) {
continue; // look for other servers which haven't reached min
  }
  int regionsToPut = min - load;
  if (regionsToPut == 0)
  {
regionsToPut = 1;
  }
if min is zero, some server that has load of zero, which equals to min would be 
marked as underloaded, which would cause the phenomenon mentioned above.
Since we increased the cluster's size to 1600+, many tables that only have 1000 
regions, now would encounter such issue.
By fixing it up, the balance plan went back to normal.



> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.1.6, 1.2.3
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Fix For: 2.0.0, 1.1.6, 1.2.3
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> {code}
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-17039:
--
Affects Version/s: (was: 1.1.6)
   1.1.7
Fix Version/s: (was: 1.2.3)
   (was: 1.1.6)
   (was: 2.0.0)

Please upload the patch so people could better understand the fix, thanks. 
[~xharlie]

> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.3, 1.1.7
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> {code}
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-16993) BucketCache throw java.io.IOException: Invalid HFile block magic when DATA_BLOCK_ENCODING set to DIFF

2016-11-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-16993:
---

> BucketCache throw java.io.IOException: Invalid HFile block magic when 
> DATA_BLOCK_ENCODING set to DIFF
> -
>
> Key: HBASE-16993
> URL: https://issues.apache.org/jira/browse/HBASE-16993
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 1.1.3
> Environment: hbase version 1.1.3
>Reporter: liubangchen
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> hbase-site.xml setting
> 
> hbase.bucketcache.bucket.sizes
> 16384,32768,40960, 
> 46000,49152,51200,65536,131072,524288
> 
> 
> hbase.bucketcache.size
> 16384
> 
> 
> hbase.bucketcache.ioengine
> offheap
> 
> 
> hfile.block.cache.size
> 0.3
> 
> 
> hfile.block.bloom.cacheonwrite
> true
> 
> 
> hbase.rs.cacheblocksonwrite
> true
> 
> 
> hfile.block.index.cacheonwrite
> true
>  n_splits = 200
> create 'usertable',{NAME =>'family', COMPRESSION => 'snappy', VERSIONS => 
> 1,DATA_BLOCK_ENCODING => 'DIFF',CONFIGURATION => 
> {'hbase.hregion.memstore.block.multiplier' => 5}},{DURABILITY => 
> 'SKIP_WAL'},{SPLITS => (1..n_splits).map {|i| 
> "user#{1000+i*(-1000)/n_splits}"}}
> load data
> bin/ycsb load hbase10 -P workloads/workloada -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> recordcount=2 -p insertorder=hashed -p insertstart=0 -p 
> clientbuffering=true -p durability=SKIP_WAL -threads 20 -s 
> run 
> bin/ycsb run hbase10 -P workloads/workloadb -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> operationcount=2000 -p readallfields=true -p clientbuffering=true -p 
> requestdistribution=zipfian  -threads 10 -s
> log info
> 2016-11-02 20:20:20,261 ERROR 
> [RW.default.readRpcServer.handler=36,queue=21,port=6020] bucket.BucketCache: 
> Failed reading block fdcc7ed6f3b2498b9ef316cc8206c233_44819759 from bucket 
> cache
> java.io.IOException: Invalid HFile block magic: 
> \x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)
> at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:167)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.(HFileBlock.java:273)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:134)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:121)
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:427)
> at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:266)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:403)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2071)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5369)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2532)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2514)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6558)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6537)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1935)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32381)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:10

[jira] [Commented] (HBASE-16993) BucketCache throw java.io.IOException: Invalid HFile block magic when DATA_BLOCK_ENCODING set to DIFF

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643147#comment-15643147
 ] 

stack commented on HBASE-16993:
---

It has to be multiple of 1024?  I took a look in refguide and don't see 
anything to that effect. I think it a bug [~liubangchen]. Let me reopen if only 
to check config is 1024 multiple and to update refguide. Dump any of your 
findings in here please.

> BucketCache throw java.io.IOException: Invalid HFile block magic when 
> DATA_BLOCK_ENCODING set to DIFF
> -
>
> Key: HBASE-16993
> URL: https://issues.apache.org/jira/browse/HBASE-16993
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 1.1.3
> Environment: hbase version 1.1.3
>Reporter: liubangchen
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> hbase-site.xml setting
> 
> hbase.bucketcache.bucket.sizes
> 16384,32768,40960, 
> 46000,49152,51200,65536,131072,524288
> 
> 
> hbase.bucketcache.size
> 16384
> 
> 
> hbase.bucketcache.ioengine
> offheap
> 
> 
> hfile.block.cache.size
> 0.3
> 
> 
> hfile.block.bloom.cacheonwrite
> true
> 
> 
> hbase.rs.cacheblocksonwrite
> true
> 
> 
> hfile.block.index.cacheonwrite
> true
>  n_splits = 200
> create 'usertable',{NAME =>'family', COMPRESSION => 'snappy', VERSIONS => 
> 1,DATA_BLOCK_ENCODING => 'DIFF',CONFIGURATION => 
> {'hbase.hregion.memstore.block.multiplier' => 5}},{DURABILITY => 
> 'SKIP_WAL'},{SPLITS => (1..n_splits).map {|i| 
> "user#{1000+i*(-1000)/n_splits}"}}
> load data
> bin/ycsb load hbase10 -P workloads/workloada -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> recordcount=2 -p insertorder=hashed -p insertstart=0 -p 
> clientbuffering=true -p durability=SKIP_WAL -threads 20 -s 
> run 
> bin/ycsb run hbase10 -P workloads/workloadb -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> operationcount=2000 -p readallfields=true -p clientbuffering=true -p 
> requestdistribution=zipfian  -threads 10 -s
> log info
> 2016-11-02 20:20:20,261 ERROR 
> [RW.default.readRpcServer.handler=36,queue=21,port=6020] bucket.BucketCache: 
> Failed reading block fdcc7ed6f3b2498b9ef316cc8206c233_44819759 from bucket 
> cache
> java.io.IOException: Invalid HFile block magic: 
> \x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)
> at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:167)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.(HFileBlock.java:273)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:134)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:121)
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:427)
> at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:266)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:403)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2071)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5369)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2532)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2514)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6558)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6537)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.ge

[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17039:

Attachment: HBASE-17039_V1.patch

> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.3, 1.1.7
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17039_V1.patch
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> {code}
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17039:

Attachment: (was: HBASE-17039_V1.patch)

> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.3, 1.1.7
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> {code}
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Qiangeng Xu updated HBASE-17039:

Attachment: HBASE-17039.patch

> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.3, 1.1.7
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17039.patch
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> {code}
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves

2016-11-06 Thread Charlie Qiangeng Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643194#comment-15643194
 ] 

Charlie Qiangeng Xu commented on HBASE-17039:
-

Just Uploaded the patch for 2.0 :)

> SimpleLoadBalancer schedules large amount of invalid region moves
> -
>
> Key: HBASE-17039
> URL: https://issues.apache.org/jira/browse/HBASE-17039
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.3, 1.1.7
>Reporter: Charlie Qiangeng Xu
>Assignee: Charlie Qiangeng Xu
> Attachments: HBASE-17039.patch
>
>
> After increasing one of our clusters to 1600 nodes, we observed a large 
> amount of invalid region moves(more than 30k moves) fired by the balance 
> chore. Thus we simulated the problem and printed out the balance plan, only 
> to find out many servers that had two regions for a certain table(we use by 
> table strategy), sent out both regions to other two servers that have zero 
> region. 
> In the SimpleLoadBalancer's balanceCluster function,
> the code block that determines the underLoadedServers might have a problem:
> {code}
>   if (load >= min && load > 0) {
> continue; // look for other servers which haven't reached min
>   }
>   int regionsToPut = min - load;
>   if (regionsToPut == 0)
>   {
> regionsToPut = 1;
>   }
> {code}
> if min is zero, some server that has load of zero, which equals to min would 
> be marked as underloaded, which would cause the phenomenon mentioned above.
> Since we increased the cluster's size to 1600+, many tables that only have 
> 1000 regions, now would encounter such issue.
> By fixing it up, the balance plan went back to normal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15513) hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default

2016-11-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15513:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: MSLAB chunk pool is on by default in hbase-2.0.0.
  Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the patch [~vrodionov] Lets see how we do.

> hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default
> --
>
> Key: HBASE-15513
> URL: https://issues.apache.org/jira/browse/HBASE-15513
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-15513-v1.patch
>
>
> That results in excessive MemStoreLAB chunk allocations because we can not 
> reuse them. Not sure, why it has been disabled, by default. May be the code 
> has not been tested well?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643196#comment-15643196
 ] 

stack commented on HBASE-16890:
---

I tried 90 for ratioio on @duo zhang patch and it took half as long again... 
780 seconds, and at a ration of 10, it took 650 odd. I reran at 50 and it came 
out again at 570 or so so default seems best. Which one of the attached is your 
patch [~ram_krish] so I can try it boss?

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-06 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643198#comment-15643198
 ] 

Mikhail Antonov commented on HBASE-17032:
-

Re-ran locally both failed tests several tests, all passed.

Pushed to 1.3, branch-1 and master.

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-17032.branch-1.3.v1.patch, 
> HBASE-17032.branch-1.3.v2.patch
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-06 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-17032:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-17032.branch-1.3.v1.patch, 
> HBASE-17032.branch-1.3.v2.patch
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17031) Scanners should check for null start and end rows

2016-11-06 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643203#comment-15643203
 ] 

Ashish Singhi commented on HBASE-17031:
---

Dup of HBASE-16498 ?
[~pankaj2461]

> Scanners should check for null start and end rows
> -
>
> Key: HBASE-17031
> URL: https://issues.apache.org/jira/browse/HBASE-17031
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Reporter: Ashu Pachauri
>Priority: Minor
>
> If a scan is passed with a null start row, it fails very deep in the call 
> stack. We should validate start and end rows for not null before launching 
> the scan.
> Here is the associated jstack:
> {code}
> java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
>   at 
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161)
>   at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:798)
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1225)
>   at 
> org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:158)
>   at 
> org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:147)
>   at 
> org.apache.hadoop.hbase.types.CopyOnWriteArrayMap$ArrayHolder.find(CopyOnWriteArrayMap.java:892)
>   at 
> org.apache.hadoop.hbase.types.CopyOnWriteArrayMap.floorEntry(CopyOnWriteArrayMap.java:169)
>   at 
> org.apache.hadoop.hbase.client.MetaCache.getCachedLocation(MetaCache.java:79)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getCachedLocation(ConnectionManager.java:1391)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1231)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1183)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
>   at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:211)
>   ... 30 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643207#comment-15643207
 ] 

stack commented on HBASE-15560:
---

How do I do the 'access trace' [~ben.manes] Let me know how I do this so I can 
pass you what you need.

How do I do 'weights' sir? I'm just doing ycsb workload c w/ zipfian flag.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17010) Serial replication should handle daughter regions being assigned to another RS

2016-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643211#comment-15643211
 ] 

Hadoop QA commented on HBASE-17010:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
9s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
58s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
59s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 41s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 41s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 12s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
38s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 134m 25s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.replication.TestReplicationStateHBaseImpl |
|   | org.apache.hadoop.hbase.TestZooKeeper |
|   | org.apache.hadoop.hbase.master.TestHMasterRPCException |
|   | org.apache.hadoop.hbase.master.TestTableLockManager |
|   | org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache |
|   | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837688/HBASE-17010-v2.patch |
| JIRA Issue | HBASE-17010 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 0233b643d150 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |

[jira] [Created] (HBASE-17040) HBase Spark does not work in Kerberos and yarn-master mode

2016-11-06 Thread Binzi Cao (JIRA)
Binzi Cao created HBASE-17040:
-

 Summary: HBase Spark does not work in Kerberos and yarn-master mode
 Key: HBASE-17040
 URL: https://issues.apache.org/jira/browse/HBASE-17040
 Project: HBase
  Issue Type: Bug
  Components: spark
Affects Versions: 1.2.0
 Environment: HBase
Kerberos
Yarn
Cloudera
Reporter: Binzi Cao


We are loading hbase records  to RDD with the hbase-spark library in Cloudera. 

The hbase-spark code works if  we submit the job with client mode, but does not 
work in cluster mode. We got below exceptions:
```
16/11/07 05:43:28 WARN security.UserGroupInformation: 
PriviledgedActionException as:spark (auth:SIMPLE) 
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
16/11/07 05:43:28 WARN ipc.RpcClientImpl: Exception encountered while 
connecting to the server : javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)]
16/11/07 05:43:28 ERROR ipc.RpcClientImpl: SASL authentication failed. The most 
likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:181)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1242)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:34118)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1627)
at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:92)
at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:89)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
at 
org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
at 
org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
at 
org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86)
at 
org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:111)
at 
org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:108)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:340)
at 
org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:108)
at 
org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(TokenUtil.java:329)
at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initCredentials(TableMapReduceUtil.java:490)
at 
org.apache.hadoop.hbase.spark.HBaseContext.(HBaseContext.scala:70)
```




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643225#comment-15643225
 ] 

ramkrishna.s.vasudevan commented on HBASE-16890:


https://issues.apache.org/jira/secure/attachment/12837537/AsyncWAL_disruptor_7.patch.
You can try this. 

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-06 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643227#comment-15643227
 ] 

Ben Manes commented on HBASE-15560:
---

Sorry for not getting to this over the weekend. A bit of a family scare which 
had a happy ending.

An access trace is a log of the key hashes on a {{get}}. Then I can replay them 
offline with the simulator. The "weight" of an entry in {{workloadc}} claims to 
be 1kb uniformly. I wasn't sure if they were going to vary, e.g with large and 
small across the distribution.

I do have ycsb integrated into the simulator for synthetic distributions so 
perhaps I can try to reproduce your observations that way.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643235#comment-15643235
 ] 

stack commented on HBASE-16890:
---

Found it over on HBASE-17021. I tried it. Took a little more than the Duo 
patch.. about 580 instead of 570... So about the same.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17035) Check why we roll a wal writer at 10MB when the configured roll size is 120M+ with AsyncFSWAL

2016-11-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643236#comment-15643236
 ] 

ramkrishna.s.vasudevan commented on HBASE-17035:


I am just to analyze your findings. If I find any will report back. 

> Check why we roll a wal writer at 10MB when the configured roll size is 120M+ 
> with AsyncFSWAL
> -
>
> Key: HBASE-17035
> URL: https://issues.apache.org/jira/browse/HBASE-17035
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
>
> Found this when addressing HBASE-16890. It is one of the possible reason that 
> why AsyncFSWAL performs worse than FSHLog when running PE tool.
> https://issues.apache.org/jira/browse/HBASE-16890?focusedCommentId=15636688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15636688



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643238#comment-15643238
 ] 

ramkrishna.s.vasudevan commented on HBASE-16890:


Ya that can be made equal. But I avoided some optimizatioins due to some test 
case issues that happens directly on region.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643250#comment-15643250
 ] 

stack commented on HBASE-16890:
---

I've not looked... are the two approaches different? Can we unify?

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643247#comment-15643247
 ] 

stack commented on HBASE-15560:
---

bq. Sorry for not getting to this over the weekend. A bit of a family scare 
which had a happy ending.

Good.

bq. An access trace is a log of the key hashes on a get. Then I can replay them 
offline with the simulator. The "weight" of an entry in workloadc claims to be 
1kb uniformly. I wasn't sure if they were going to vary, e.g with large and 
small across the distribution.

Would you want the same dataset loaded too?

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643251#comment-15643251
 ] 

stack commented on HBASE-16890:
---

Or, why are we twice as slow? Its reasonable to think that asyncwal should be 
much faster?

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-06 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643265#comment-15643265
 ] 

Ben Manes commented on HBASE-15560:
---

{{quote}
Would you want the same dataset loaded too?
{{quote}}

That can't hurt, so unless its more work might as well.

---

In my [simulator|https://github.com/ben-manes/caffeine/wiki/Simulator], I tried 
to emulate {{workload c}} using the following configuration,
 * maximum-size = (below)
 * source = "synthetic"
 * distribution = "zipfian"
 * zipfian.items = 1000

I then ran it with small caches to emulate your observation. {{LruBlockCache}} 
is an SLru variant, so I'm assuming it behaves similar to the theoretical 
version.

||Policy||max=5||max=10||max=25||
|Lru|13.10%|20.70%|35.60%|
|SLru|25.90%|29.30|45.00%|
|Caffeine|24.40%|32.30%|46.00%|
|Optimal|35.20%|42.10%|45.50%|

We see that at the smallest size, 5, Caffeine slightly under performs. However 
whether its slightly lower, equal, or higher varies on the run. This is due to 
the distribution generation and Caffeine's hashing having randomness, so across 
runs we see it pretty much on par. As the size increases we see them all stay 
pretty close. Since SLru is known to be optimal for Zipf, this at least is a 
good sign but does not explain your observations.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-06 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643270#comment-15643270
 ] 

Ben Manes commented on HBASE-15560:
---

In the last run, {{Optimal}} is {{55.50%}}. Sorry for the typo.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643272#comment-15643272
 ] 

stack commented on HBASE-15560:
---

Right. How you going to simulate my case where lots of cache misses? Can i turn 
on a logging or something for you? Seems pretty useless sending you a bunch of 
access keys if you don't have same dataset loaded and same hw.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643271#comment-15643271
 ] 

Duo Zhang commented on HBASE-16890:
---

We both want to reduce the contention by introducing a ringbuffer, but with 
different ways.

Ram's patch aims to introduce a disruptor in front of the old 
waitingConsumePayloads, all the payload will be added to the ringbuffer firstly 
and then the disruptor event processor will poll the payload from ringbuffer 
and add it to waitingConsumePayloads.

My patch aims to use ringbuffer directly as the waitingConsumePayloads and 
remove locking as much as possible.

Ideally ram's patch should be small and simple(but we still have some problem 
on the txid generating...) and my patch will be more complicated. But as I can 
also remove all the lockings in append and sync so I think my patch can save 
one thread(the event processor thread in ram's patch) which is a little better.

Thanks.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-06 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643283#comment-15643283
 ] 

Ben Manes commented on HBASE-15560:
---

Are the access keys not in the data set so that it is not found? I assumed a 
miss means query the cache, load, store into the cache. If queried again, it 
should be a cache hit.

If that's correct then the value has no meaning and the keys are the access 
distributions. Any surrogate, like a hash, will be representative. So using the 
same Zipf distribution should give us similar results.

But I might be mistaking how the cache is used in HBase and evaluating it 
incorrectly in isolation

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15513) hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default

2016-11-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643293#comment-15643293
 ] 

Anoop Sam John commented on HBASE-15513:


There is a jira to default ON the heap memory tuning also. That should be also 
committed now as that will give chance to release some of the buffers we keep 
in the ChukPool if at present there are not much write reqs and release 
memstore memory area is ok.

> hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default
> --
>
> Key: HBASE-15513
> URL: https://issues.apache.org/jira/browse/HBASE-15513
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-15513-v1.patch
>
>
> That results in excessive MemStoreLAB chunk allocations because we can not 
> reuse them. Not sure, why it has been disabled, by default. May be the code 
> has not been tested well?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15513) hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default

2016-11-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643295#comment-15643295
 ] 

Anoop Sam John commented on HBASE-15513:


Do we need to mark this as incompatible change as there is a behave change now 
wrt handling of RS memory?

> hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default
> --
>
> Key: HBASE-15513
> URL: https://issues.apache.org/jira/browse/HBASE-15513
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-15513-v1.patch
>
>
> That results in excessive MemStoreLAB chunk allocations because we can not 
> reuse them. Not sure, why it has been disabled, by default. May be the code 
> has not been tested well?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16992) The usage of mutation from CP is weird.

2016-11-06 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-16992:
---
Fix Version/s: (was: 1.3.1)

> The usage of mutation from CP is weird.
> ---
>
> Key: HBASE-16992
> URL: https://issues.apache.org/jira/browse/HBASE-16992
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16992.branch-1.v0.patch, HBASE-16992.v0.patch
>
>
> {code:title=HRegion#doMiniBatchMutate|borderStyle=solid}
> Mutation cpMutation = cpMutations[j];
> Map> cpFamilyMap = cpMutation.getFamilyCellMap();
> checkAndPrepareMutation(cpMutation, replay, cpFamilyMap, now);
>  // Acquire row locks. If not, the whole batch will fail.
> acquiredRowLocks.add(getRowLockInternal(cpMutation.getRow(), true));
> if (cpMutation.getDurability() == Durability.SKIP_WAL) {
>   recordMutationWithoutWal(cpFamilyMap);
> }
> // Returned mutations from coprocessor correspond to the Mutation at index i. 
> We can
>  // directly add the cells from those mutations to the familyMaps of this 
> mutation.
> mergeFamilyMaps(familyMaps[i], cpFamilyMap); // will get added to the 
> memstore later
> {code}
> 1. Does the returned mutation from coprocessor have the same row as the 
> corresponded mutation? If so, the acquiredRowLocks() can be saved. If not, 
> the corresponded mutation may maintain the cells with different row due to 
> mergeFamilyMaps().
> 2. Is returned mutation's durability useful? If so, we should deal with the 
> different durabilities before mergeFamilyMaps(). If not, the 
> recordMutationWithoutWal can be saved. 
> 3. If both the returned mutation and corresponded mutation have 
> Durability.SKIP_WAL, the recordMutationWithoutWal() may record the duplicate 
> cells due to mergeFamilyMaps().
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-06 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643323#comment-15643323
 ] 

Duo Zhang commented on HBASE-16890:
---

With 3 DNs then AsyncFSWAL should be faster as we use fan out. And one DN is 
another story as fan out and chained are the same...

Anyway, two times slower is not acceptable. I think one possible reason is what 
I said above, AsyncFSWAL could sync more than FSHLog. Let me add a limit on the 
concurrent sync request after we finished HBASE-17021.

Thanks.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16992) The usage of mutation from CP is weird.

2016-11-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643324#comment-15643324
 ] 

Anoop Sam John commented on HBASE-16992:


{code}
@VisibleForTesting
+  static long estimatedHeapSizeOf(final Cell cell) {
+// TODO we need include tags length also here.
+return KeyValueUtil.keyLength(cell) + cell.getValueLength();
+  }
{code}
Sorry we can not call it estimatedHeapSize(Cell)..  The name says it is the 
heap space occupied by this Cell. Similar API is there in CellUtil
What we do here is calculate the length of the cell in KV format.  And we dont 
include tags length. U can see the TODO.
What u can do is use the API KeyValueUtil#length(Cell).  This will include tags 
length also. That is fine we need that any way.
In place of KeyValueUtil.keyLength(cell) + cell.getValueLength()  use the new 
length() call. From test also u can use that.  No need to add an extra API 
within HRegion.
Else ok
Pls attach latest patch for branch-1 also.
Plan to commit to branch-1 and trunk only.


> The usage of mutation from CP is weird.
> ---
>
> Key: HBASE-16992
> URL: https://issues.apache.org/jira/browse/HBASE-16992
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16992.branch-1.v0.patch, HBASE-16992.v0.patch
>
>
> {code:title=HRegion#doMiniBatchMutate|borderStyle=solid}
> Mutation cpMutation = cpMutations[j];
> Map> cpFamilyMap = cpMutation.getFamilyCellMap();
> checkAndPrepareMutation(cpMutation, replay, cpFamilyMap, now);
>  // Acquire row locks. If not, the whole batch will fail.
> acquiredRowLocks.add(getRowLockInternal(cpMutation.getRow(), true));
> if (cpMutation.getDurability() == Durability.SKIP_WAL) {
>   recordMutationWithoutWal(cpFamilyMap);
> }
> // Returned mutations from coprocessor correspond to the Mutation at index i. 
> We can
>  // directly add the cells from those mutations to the familyMaps of this 
> mutation.
> mergeFamilyMaps(familyMaps[i], cpFamilyMap); // will get added to the 
> memstore later
> {code}
> 1. Does the returned mutation from coprocessor have the same row as the 
> corresponded mutation? If so, the acquiredRowLocks() can be saved. If not, 
> the corresponded mutation may maintain the cells with different row due to 
> mergeFamilyMaps().
> 2. Is returned mutation's durability useful? If so, we should deal with the 
> different durabilities before mergeFamilyMaps(). If not, the 
> recordMutationWithoutWal can be saved. 
> 3. If both the returned mutation and corresponded mutation have 
> Durability.SKIP_WAL, the recordMutationWithoutWal() may record the duplicate 
> cells due to mergeFamilyMaps().
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17026) VerifyReplication log should distinguish whether good row key is result of revalidation

2016-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643349#comment-15643349
 ] 

Hadoop QA commented on HBASE-17026:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
2s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
46m 48s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 132m 19s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
34s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 202m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
| Timed out junit tests | 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes |
|   | org.apache.hadoop.hbase.security.visibility.TestWithDisabledAuthorization 
|
|   | org.apache.hadoop.hbase.replication.TestMasterReplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837690/17026.v2.txt |
| JIRA Issue | HBASE-17026 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux ea42438add37 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7e05d0f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Buil

[jira] [Commented] (HBASE-17035) Check why we roll a wal writer at 10MB when the configured roll size is 120M+ with AsyncFSWAL

2016-11-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643365#comment-15643365
 ] 

ramkrishna.s.vasudevan commented on HBASE-17035:


While loading 50G of data with 50 threads. AsyncFSWAL is more than twice the 
time it takes for FShLOg.
There are 126 log roll requests with fSHLog but with AsyncFSWAL we have 215 log 
roll requests. And we have lot of log rolls with 10MB to 13MB sizes and that 
what I got is not due to memory flush.
Interesting part is this
{code}
2016-11-07 18:09:13,688 INFO  
[regionserver/stobdtserver5/10.224.54.65:16041.logRoller] wal.AbstractFSWAL: 
Rolled WAL 
/hbase1/WALs/stobdtserver5,16041,1478521203795/stobdtserver5%2C16041%2C1478521203795.1478522344435
 with entries=1566632, filesize=518.92 MB; new WAL 
/hbase1/WALs/stobdtserver5,16041,1478521203795/stobdtserver5%2C16041%2C1478521203795.1478522353615
2016-11-07 18:09:13,853 INFO  
[regionserver/stobdtserver5/10.224.54.65:16041.logRoller] wal.AbstractFSWAL: 
Rolled WAL 
/hbase1/WALs/stobdtserver5,16041,1478521203795/stobdtserver5%2C16041%2C1478521203795.1478522353615
 with entries=1566992, filesize=13.88 MB; new WAL 
/hbase1/WALs/stobdtserver5,16041,1478521203795/stobdtserver5%2C16041%2C1478521203795.1478522353689
{code}

The new WAL file generated is immediately rolled to another file of size 
13.88MB but the total number of entries is almost the same but the size is so 
small. This pattern repeats quite frequently. 

> Check why we roll a wal writer at 10MB when the configured roll size is 120M+ 
> with AsyncFSWAL
> -
>
> Key: HBASE-17035
> URL: https://issues.apache.org/jira/browse/HBASE-17035
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
>
> Found this when addressing HBASE-16890. It is one of the possible reason that 
> why AsyncFSWAL performs worse than FSHLog when running PE tool.
> https://issues.apache.org/jira/browse/HBASE-16890?focusedCommentId=15636688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15636688



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16989) RowProcess#postBatchMutate doesn’t be executed before the mvcc transaction completion

2016-11-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643382#comment-15643382
 ] 

Anoop Sam John commented on HBASE-16989:


Looks mostly good.  The placement of the post hook is fine now.
processRowsWithLocks() we deal with the try blocks now. An outer try block is 
removed. 
Inside this outer try, we called getRowLock.  That can get timeout and throw 
exception so that no ops will happen.  Previously the outer try finally will do 
the closeRegionOp.  Now if u remove and move the calls out of try, that can be 
an issue.  Can u just keep the original way of try's and finally.
Just move the call to post from old place to correct new place.   Always make 
sure to call the cp hooks within try finally. (Which u are ensuring I can see. 
Good)

> RowProcess#postBatchMutate doesn’t be executed before the mvcc transaction 
> completion
> -
>
> Key: HBASE-16989
> URL: https://issues.apache.org/jira/browse/HBASE-16989
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
> Fix For: 2.0.0
>
> Attachments: HBASE-16989.v0.patch, HBASE-16989.v1.patch, 
> HBASE-16989.v2.patch
>
>
> After the [HBASE-15158|https://issues.apache.org/jira/browse/HBASE-15158], 
> RowProcess#postBatchMutate will be executed “after” the mvcc transaction 
> completion.
> {code:title=HRegion#processRowsWithLocks}
>   // STEP 8. Complete mvcc.
>   mvcc.completeAndWait(writeEntry);
>   writeEntry = null;
> 
>   // STEP 9. Release region lock
>   if (locked) {
> this.updatesLock.readLock().unlock();
> locked = false;
>   }
> 
>   // STEP 10. Release row lock(s)
>   releaseRowLocks(acquiredRowLocks);
> 
>   // STEP 11. call postBatchMutate hook
>   processor.postBatchMutate(this);
> {code}
> {code:title=RowProcess#postBatchMutate}
>   /**
>* The hook to be executed after the process() and applying the Mutations 
> to region. The
>* difference of this one with {@link #postProcess(HRegion, WALEdit, 
> boolean)} is this hook will
>* be executed before the mvcc transaction completion.
>*/
>   void postBatchMutate(HRegion region) throws IOException;
> {code}
> Do we ought to revamp the comment of RowProcess#postBatchMutate or change the 
> call order?
> I prefer the former, because the HRegion#doMiniBatchMutate() also call 
> postBatchMutate() after the mvcc transaction completion.
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16992) The usage of mutation from CP is weird.

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-16992:
--
Status: Open  (was: Patch Available)

> The usage of mutation from CP is weird.
> ---
>
> Key: HBASE-16992
> URL: https://issues.apache.org/jira/browse/HBASE-16992
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16992.branch-1.v0.patch, HBASE-16992.v0.patch
>
>
> {code:title=HRegion#doMiniBatchMutate|borderStyle=solid}
> Mutation cpMutation = cpMutations[j];
> Map> cpFamilyMap = cpMutation.getFamilyCellMap();
> checkAndPrepareMutation(cpMutation, replay, cpFamilyMap, now);
>  // Acquire row locks. If not, the whole batch will fail.
> acquiredRowLocks.add(getRowLockInternal(cpMutation.getRow(), true));
> if (cpMutation.getDurability() == Durability.SKIP_WAL) {
>   recordMutationWithoutWal(cpFamilyMap);
> }
> // Returned mutations from coprocessor correspond to the Mutation at index i. 
> We can
>  // directly add the cells from those mutations to the familyMaps of this 
> mutation.
> mergeFamilyMaps(familyMaps[i], cpFamilyMap); // will get added to the 
> memstore later
> {code}
> 1. Does the returned mutation from coprocessor have the same row as the 
> corresponded mutation? If so, the acquiredRowLocks() can be saved. If not, 
> the corresponded mutation may maintain the cells with different row due to 
> mergeFamilyMaps().
> 2. Is returned mutation's durability useful? If so, we should deal with the 
> different durabilities before mergeFamilyMaps(). If not, the 
> recordMutationWithoutWal can be saved. 
> 3. If both the returned mutation and corresponded mutation have 
> Durability.SKIP_WAL, the recordMutationWithoutWal() may record the duplicate 
> cells due to mergeFamilyMaps().
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17035) Check why we roll a wal writer at 10MB when the configured roll size is 120M+ with AsyncFSWAL

2016-11-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643389#comment-15643389
 ] 

Anoop Sam John commented on HBASE-17035:


Oh..  That means there is some thing silly on calc the #cells added to WAL (?)  
Based on that and sum of cell length we decide on the roll?

> Check why we roll a wal writer at 10MB when the configured roll size is 120M+ 
> with AsyncFSWAL
> -
>
> Key: HBASE-17035
> URL: https://issues.apache.org/jira/browse/HBASE-17035
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
>
> Found this when addressing HBASE-16890. It is one of the possible reason that 
> why AsyncFSWAL performs worse than FSHLog when running PE tool.
> https://issues.apache.org/jira/browse/HBASE-16890?focusedCommentId=15636688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15636688



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16992) The usage of mutation from CP is weird.

2016-11-06 Thread ChiaPing Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChiaPing Tsai updated HBASE-16992:
--
Attachment: HBASE-16992.v1.patch

address [~anoop.hbase]'s comment

> The usage of mutation from CP is weird.
> ---
>
> Key: HBASE-16992
> URL: https://issues.apache.org/jira/browse/HBASE-16992
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16992.branch-1.v0.patch, HBASE-16992.v0.patch, 
> HBASE-16992.v1.patch
>
>
> {code:title=HRegion#doMiniBatchMutate|borderStyle=solid}
> Mutation cpMutation = cpMutations[j];
> Map> cpFamilyMap = cpMutation.getFamilyCellMap();
> checkAndPrepareMutation(cpMutation, replay, cpFamilyMap, now);
>  // Acquire row locks. If not, the whole batch will fail.
> acquiredRowLocks.add(getRowLockInternal(cpMutation.getRow(), true));
> if (cpMutation.getDurability() == Durability.SKIP_WAL) {
>   recordMutationWithoutWal(cpFamilyMap);
> }
> // Returned mutations from coprocessor correspond to the Mutation at index i. 
> We can
>  // directly add the cells from those mutations to the familyMaps of this 
> mutation.
> mergeFamilyMaps(familyMaps[i], cpFamilyMap); // will get added to the 
> memstore later
> {code}
> 1. Does the returned mutation from coprocessor have the same row as the 
> corresponded mutation? If so, the acquiredRowLocks() can be saved. If not, 
> the corresponded mutation may maintain the cells with different row due to 
> mergeFamilyMaps().
> 2. Is returned mutation's durability useful? If so, we should deal with the 
> different durabilities before mergeFamilyMaps(). If not, the 
> recordMutationWithoutWal can be saved. 
> 3. If both the returned mutation and corresponded mutation have 
> Durability.SKIP_WAL, the recordMutationWithoutWal() may record the duplicate 
> cells due to mergeFamilyMaps().
> Any comment? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >