[jira] [Commented] (HBASE-15952) Bulk load data replication is not working when RS user does not have permission on hfile-refs node

2016-06-05 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316224#comment-15316224
 ] 

Ashish Singhi commented on HBASE-15952:
---

Thanks for the comment [~apurtell]. We did not find any other possible cases in 
our internal testing which will cause this problem. The other znode which 
directly client creates it {{zookeeper.znode.replication.peers}}, which is 
anyways created/deleted by client only, so no issues here.

> Bulk load data replication is not working when RS user does not have 
> permission on hfile-refs node
> --
>
> Key: HBASE-15952
> URL: https://issues.apache.org/jira/browse/HBASE-15952
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.0
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>
> In our recent testing in secure cluster we found that when a RS user does not 
> have permission on hfile-refs znode then RS was failing to replicate the bulk 
> loaded data as the hfile-refs znode is created by hbase client and RS user 
> may not have permission to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15174) Client Public API should not have PB objects in 2.0

2016-06-05 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-15174:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Client Public API should not have PB objects in 2.0
> ---
>
> Key: HBASE-15174
> URL: https://issues.apache.org/jira/browse/HBASE-15174
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-15174.patch, HBASE-15174_1.patch, 
> HBASE-15174_2.patch, HBASE-15174_3.patch, HBASE-15174_4.patch, 
> HBASE-15174_5.patch
>
>
> Some more cleanup for the parent jira. 
> We have leaked some PB structs in Admin (and possible other places). 
> We should clean up these API before 2.0.
> Examples include: 
> {code}
>   AdminProtos.GetRegionInfoResponse.CompactionState getCompactionState(final 
> TableName tableName)
> throws IOException;
>
> 
>   void snapshot(final String snapshotName,
>   final TableName tableName,
>   HBaseProtos.SnapshotDescription.Type type) throws IOException, 
> SnapshotCreationException,
>   IllegalArgumentException;
>
>   MasterProtos.SnapshotResponse 
> takeSnapshotAsync(HBaseProtos.SnapshotDescription snapshot)
>   throws IOException, SnapshotCreationException;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15174) Client Public API should not have PB objects in 2.0

2016-06-05 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-15174:
---
Attachment: HBASE-15174_5.patch

This is what was committed. Created a private method for the common part of 
class loading. But did not cache the class because only for the first test case 
it takes 7 secs and after that all the tests just take 0.2 to 0.3 secs. Thanks 
all for the reviews. 

> Client Public API should not have PB objects in 2.0
> ---
>
> Key: HBASE-15174
> URL: https://issues.apache.org/jira/browse/HBASE-15174
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-15174.patch, HBASE-15174_1.patch, 
> HBASE-15174_2.patch, HBASE-15174_3.patch, HBASE-15174_4.patch, 
> HBASE-15174_5.patch
>
>
> Some more cleanup for the parent jira. 
> We have leaked some PB structs in Admin (and possible other places). 
> We should clean up these API before 2.0.
> Examples include: 
> {code}
>   AdminProtos.GetRegionInfoResponse.CompactionState getCompactionState(final 
> TableName tableName)
> throws IOException;
>
> 
>   void snapshot(final String snapshotName,
>   final TableName tableName,
>   HBaseProtos.SnapshotDescription.Type type) throws IOException, 
> SnapshotCreationException,
>   IllegalArgumentException;
>
>   MasterProtos.SnapshotResponse 
> takeSnapshotAsync(HBaseProtos.SnapshotDescription snapshot)
>   throws IOException, SnapshotCreationException;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15594) [YCSB] Improvements

2016-06-05 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316188#comment-15316188
 ] 

Anoop Sam John commented on HBASE-15594:


Interesting Stack.

> [YCSB] Improvements
> ---
>
> Key: HBASE-15594
> URL: https://issues.apache.org/jira/browse/HBASE-15594
> Project: HBase
>  Issue Type: Umbrella
>Reporter: stack
>Priority: Critical
> Attachments: fast.patch
>
>
> Running YCSB and getting good results is an arcane art. For example, in my 
> testing, a few handlers (100) with as many readers as I had CPUs (48), and 
> upping connections on clients to same as #cpus made for 2-3x the throughput. 
> The above config changes came of lore; which configurations need tweaking is 
> not obvious going by their names, there were no indications from the app on 
> where/why we were blocked or on which metrics are important to consider. Nor 
> was any of this stuff written down in docs.
> Even still, I am stuck trying to make use of all of the machine. I am unable 
> to overrun a server though 8 client nodes trying to beat up a single node 
> (workloadc, all random-read, with no data returned -p  readallfields=false). 
> There is also a strange phenomenon where if I add a few machines, rather than 
> 3x the YCSB throughput when 3 nodes in cluster, each machine instead is doing 
> about 1/3rd.
> This umbrella issue is to host items that improve our defaults and noting how 
> to get good numbers running YCSB. In particular, I want to be able to 
> saturate a machine.
> Here are the configs I'm currently working with. I've not done the work to 
> figure client-side if they are optimal (weird is how big a difference 
> client-side changes can make -- need to fix this). On my 48 cpu machine, I 
> can do about 370k random reads a second from data totally cached in 
> bucketcache. If I short-circuit the user gets so they don't do any work but 
> return immediately, I can do 600k ops a second but the CPUs are at 60-70% 
> only. I cannot get them to go above this. Working on it.
> {code}
> 
> 
> hbase.ipc.server.read.threadpool.size
> 
> 48
> 
> 
> 
> hbase.regionserver.handler.count
> 
> 100
> 
> 
> 
> hbase.client.ipc.pool.size
> 
> 100
> 
> 
> 
> hbase.htable.threads.max
> 
> 48
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization constrains random read

2016-06-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316153#comment-15316153
 ] 

stack commented on HBASE-15716:
---

Interesting observation was hacking out this lock, I ran then into my being 
blocked responding... 

{code}
"RpcServer.reader=1,bindAddress=ve0528.halxg.cloudera.com,port=16020" #34 
daemon prio=5 os_prio=0 tid=0x7fa76d886800 nid=0x59f0 runnable 
[0x7f9f515e9000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:501)
- locked <0x7fa41f096f40> (a java.lang.Object)
- locked <0x7fa41f096f28> (a java.lang.Object)
at org.apache.hadoop.hbase.ipc.BufferChain.write(BufferChain.java:105)
at 
org.apache.hadoop.hbase.ipc.RpcServer.channelWrite(RpcServer.java:2401)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Responder.processResponse(RpcServer.java:1072)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Responder.doRespond(RpcServer.java:1136)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Call.sendResponseIfReady(RpcServer.java:570)
- locked <0x7f9fbf7652d0> (a 
org.apache.hadoop.hbase.ipc.RpcServer$Call)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:139)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.dispatch(SimpleRpcScheduler.java:274)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1871)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1762)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Connection.process(RpcServer.java:1608)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1588)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:838)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:696)
- locked <0x7fa06a26acc0> (a 
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:667)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}


Other notes on this synchronization are that as the throughput goes up, this 
synchronization becomes more of an obstacle. At rates of hundreds of ops a 
second, the churn in the CSLM shows... I should be able to do an array of 
volatiles or something sized by handlers/readers? I should also be able to do 
something with the fact that readpt is always incrementing... will be back.

> HRegion#RegionScannerImpl scannerReadPoints synchronization constrains random 
> read
> --
>
> Key: HBASE-15716
> URL: https://issues.apache.org/jira/browse/HBASE-15716
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
> Attachments: 15716.prune.synchronizations.patch, 
> 15716.prune.synchronizations.v3.patch, 15716.prune.synchronizations.v4.patch, 
> 15716.prune.synchronizations.v4.patch, 15716.wip.more_to_be_done.patch, 
> Screen Shot 2016-04-26 at 2.05.45 PM.png, Screen Shot 2016-04-26 at 2.06.14 
> PM.png, Screen Shot 2016-04-26 at 2.07.06 PM.png, Screen Shot 2016-04-26 at 
> 2.25.26 PM.png, Screen Shot 2016-04-26 at 6.02.29 PM.png, Screen Shot 
> 2016-04-27 at 9.49.35 AM.png, 
> current-branch-1.vs.NoSynchronization.vs.Patch.png, hits.png, 
> remove_cslm.patch
>
>
> Here is a [~lhofhansl] special.
> When we construct the region scanner, we get our read point and then store it 
> with the scanner instance in a Region scoped CSLM. This is done under a 
> synchronize on the CSLM.
> This synchronize on a region-scoped Map creating region scanners is the 
> outstanding point of lock contention according to flight recorder (My work 
> load is workload c, random reads).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15499) Add multiple data type support for increment

2016-06-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316150#comment-15316150
 ] 

stack commented on HBASE-15499:
---

[~heliangliang] What you think of [~ndimiduk] comment boss? Thanks.

> Add multiple data type support for increment
> 
>
> Key: HBASE-15499
> URL: https://issues.apache.org/jira/browse/HBASE-15499
> Project: HBase
>  Issue Type: New Feature
>  Components: API
>Reporter: He Liangliang
>Assignee: He Liangliang
> Attachments: HBASE-15499-V2.diff, HBASE-15499-V3.diff, 
> HBASE-15499-V4.diff, HBASE-15499-V5.diff, HBASE-15499-V5.patch, 
> HBASE-15499.diff
>
>
> Currently the increment assumes long with byte-wise serialization. It's 
> useful to  support flexible data type/serializations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HBASE-15406) Split / merge switch left disabled after early termination of hbck

2016-06-05 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-15406:
--
Comment: was deleted

(was: {quote}
 When admin / HBCK puts the master in maintenance mode, she can optionally 
supply an ephemeral znode path that the master will watch. As soon as all 
ephemeral nodes goes away, master will go out of maintenance mode. Every 
instance of HBCK creates an ephemeral znode, so that even more than one 
instance is running, there won't be issues if one finishes, while the others 
are going. wdyt?
{quote}

[~enis] Have you see comments of [~stack] above.  

{quote}
To be more clear, -1 on a patch that has master doing a rollback of a state set 
by external administrator's tool. HBCK already leaves the cluster in a state of 
disequilibrium when killed mid-processing... Usual way this is addressed is 
HBCK gets rerun... not master does cleanup.
{quote}

The first patch is creating ephemeral node when hbck running and master do 
watch it.  When hbck abort,  master will rollback the state.  But as [~stack] 
comments,  "Usual way is HBCK gets rerun, not master cleanup".

)

> Split / merge switch left disabled after early termination of hbck
> --
>
> Key: HBASE-15406
> URL: https://issues.apache.org/jira/browse/HBASE-15406
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Heng Chen
>Priority: Critical
>  Labels: reviewed
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-15406.patch, HBASE-15406.v1.patch, 
> HBASE-15406_v1.patch, HBASE-15406_v2.patch, test.patch, wip.patch
>
>
> This was what I did on cluster with 1.4.0-SNAPSHOT built Thursday:
> Run 'hbase hbck -disableSplitAndMerge' on gateway node of the cluster
> Terminate hbck early
> Enter hbase shell where I observed:
> {code}
> hbase(main):001:0> splitormerge_enabled 'SPLIT'
> false
> 0 row(s) in 0.3280 seconds
> hbase(main):002:0> splitormerge_enabled 'MERGE'
> false
> 0 row(s) in 0.0070 seconds
> {code}
> Expectation is that the split / merge switches should be restored to default 
> value after hbck exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15406) Split / merge switch left disabled after early termination of hbck

2016-06-05 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316143#comment-15316143
 ] 

Heng Chen commented on HBASE-15406:
---

{quote}
 When admin / HBCK puts the master in maintenance mode, she can optionally 
supply an ephemeral znode path that the master will watch. As soon as all 
ephemeral nodes goes away, master will go out of maintenance mode. Every 
instance of HBCK creates an ephemeral znode, so that even more than one 
instance is running, there won't be issues if one finishes, while the others 
are going. wdyt?
{quote}

[~enis] Have you see comments of [~stack] above.  

{quote}
To be more clear, -1 on a patch that has master doing a rollback of a state set 
by external administrator's tool. HBCK already leaves the cluster in a state of 
disequilibrium when killed mid-processing... Usual way this is addressed is 
HBCK gets rerun... not master does cleanup.
{quote}

The first patch is creating ephemeral node when hbck running and master do 
watch it.  When hbck abort,  master will rollback the state.  But as [~stack] 
comments,  "Usual way is HBCK gets rerun, not master cleanup".



> Split / merge switch left disabled after early termination of hbck
> --
>
> Key: HBASE-15406
> URL: https://issues.apache.org/jira/browse/HBASE-15406
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Heng Chen
>Priority: Critical
>  Labels: reviewed
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-15406.patch, HBASE-15406.v1.patch, 
> HBASE-15406_v1.patch, HBASE-15406_v2.patch, test.patch, wip.patch
>
>
> This was what I did on cluster with 1.4.0-SNAPSHOT built Thursday:
> Run 'hbase hbck -disableSplitAndMerge' on gateway node of the cluster
> Terminate hbck early
> Enter hbase shell where I observed:
> {code}
> hbase(main):001:0> splitormerge_enabled 'SPLIT'
> false
> 0 row(s) in 0.3280 seconds
> hbase(main):002:0> splitormerge_enabled 'MERGE'
> false
> 0 row(s) in 0.0070 seconds
> {code}
> Expectation is that the split / merge switches should be restored to default 
> value after hbck exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15406) Split / merge switch left disabled after early termination of hbck

2016-06-05 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316144#comment-15316144
 ] 

Heng Chen commented on HBASE-15406:
---

{quote}
 When admin / HBCK puts the master in maintenance mode, she can optionally 
supply an ephemeral znode path that the master will watch. As soon as all 
ephemeral nodes goes away, master will go out of maintenance mode. Every 
instance of HBCK creates an ephemeral znode, so that even more than one 
instance is running, there won't be issues if one finishes, while the others 
are going. wdyt?
{quote}

[~enis] Have you see comments of [~stack] above.  

{quote}
To be more clear, -1 on a patch that has master doing a rollback of a state set 
by external administrator's tool. HBCK already leaves the cluster in a state of 
disequilibrium when killed mid-processing... Usual way this is addressed is 
HBCK gets rerun... not master does cleanup.
{quote}

The first patch is creating ephemeral node when hbck running and master do 
watch it.  When hbck abort,  master will rollback the state.  But as [~stack] 
comments,  "Usual way is HBCK gets rerun, not master cleanup".



> Split / merge switch left disabled after early termination of hbck
> --
>
> Key: HBASE-15406
> URL: https://issues.apache.org/jira/browse/HBASE-15406
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Heng Chen
>Priority: Critical
>  Labels: reviewed
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-15406.patch, HBASE-15406.v1.patch, 
> HBASE-15406_v1.patch, HBASE-15406_v2.patch, test.patch, wip.patch
>
>
> This was what I did on cluster with 1.4.0-SNAPSHOT built Thursday:
> Run 'hbase hbck -disableSplitAndMerge' on gateway node of the cluster
> Terminate hbck early
> Enter hbase shell where I observed:
> {code}
> hbase(main):001:0> splitormerge_enabled 'SPLIT'
> false
> 0 row(s) in 0.3280 seconds
> hbase(main):002:0> splitormerge_enabled 'MERGE'
> false
> 0 row(s) in 0.0070 seconds
> {code}
> Expectation is that the split / merge switches should be restored to default 
> value after hbck exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15594) [YCSB] Improvements

2016-06-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316142#comment-15316142
 ] 

stack commented on HBASE-15594:
---

So, again, the Reader doing the whole read/parse of the request and then 
executing it ups our ops by >2x (From about 125k to 425k workloadc random reads 
from LRUBlockCache -- about 7-11% CPU idle). The new occupied-readers-count 
metric shows Readers reading all occupied nearly all the time... as opposed to 
what we see when we look at handlers (I can't get a higher utilization on 
handlers no matter what loading I put up). Mighty [~tlipcon] pointed me at a 
short-circuit the kudu folks do where they do direct handoff from reader to 
worker thread http://gerrit.cloudera.org:8080/#/c/2938/... let me see if I can 
do similar.

After the above hackery, the next 'blocker' is the registry of Scanners in the 
Region CSLM with synchronization to get read point. If I hack it out -- have 
some ideas for making it less of a hurdle -- it is interesting to see that we 
then get stuck behind sending the response AND our throughput goes down 
slightly... So some work to do here.

> [YCSB] Improvements
> ---
>
> Key: HBASE-15594
> URL: https://issues.apache.org/jira/browse/HBASE-15594
> Project: HBase
>  Issue Type: Umbrella
>Reporter: stack
>Priority: Critical
> Attachments: fast.patch
>
>
> Running YCSB and getting good results is an arcane art. For example, in my 
> testing, a few handlers (100) with as many readers as I had CPUs (48), and 
> upping connections on clients to same as #cpus made for 2-3x the throughput. 
> The above config changes came of lore; which configurations need tweaking is 
> not obvious going by their names, there were no indications from the app on 
> where/why we were blocked or on which metrics are important to consider. Nor 
> was any of this stuff written down in docs.
> Even still, I am stuck trying to make use of all of the machine. I am unable 
> to overrun a server though 8 client nodes trying to beat up a single node 
> (workloadc, all random-read, with no data returned -p  readallfields=false). 
> There is also a strange phenomenon where if I add a few machines, rather than 
> 3x the YCSB throughput when 3 nodes in cluster, each machine instead is doing 
> about 1/3rd.
> This umbrella issue is to host items that improve our defaults and noting how 
> to get good numbers running YCSB. In particular, I want to be able to 
> saturate a machine.
> Here are the configs I'm currently working with. I've not done the work to 
> figure client-side if they are optimal (weird is how big a difference 
> client-side changes can make -- need to fix this). On my 48 cpu machine, I 
> can do about 370k random reads a second from data totally cached in 
> bucketcache. If I short-circuit the user gets so they don't do any work but 
> return immediately, I can do 600k ops a second but the CPUs are at 60-70% 
> only. I cannot get them to go above this. Working on it.
> {code}
> 
> 
> hbase.ipc.server.read.threadpool.size
> 
> 48
> 
> 
> 
> hbase.regionserver.handler.count
> 
> 100
> 
> 
> 
> hbase.client.ipc.pool.size
> 
> 100
> 
> 
> 
> hbase.htable.threads.max
> 
> 48
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15594) [YCSB] Improvements

2016-06-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15594:
--
Attachment: fast.patch

Hackery.. how to do Call on Reader thread and short-circuit Gets

> [YCSB] Improvements
> ---
>
> Key: HBASE-15594
> URL: https://issues.apache.org/jira/browse/HBASE-15594
> Project: HBase
>  Issue Type: Umbrella
>Reporter: stack
>Priority: Critical
> Attachments: fast.patch
>
>
> Running YCSB and getting good results is an arcane art. For example, in my 
> testing, a few handlers (100) with as many readers as I had CPUs (48), and 
> upping connections on clients to same as #cpus made for 2-3x the throughput. 
> The above config changes came of lore; which configurations need tweaking is 
> not obvious going by their names, there were no indications from the app on 
> where/why we were blocked or on which metrics are important to consider. Nor 
> was any of this stuff written down in docs.
> Even still, I am stuck trying to make use of all of the machine. I am unable 
> to overrun a server though 8 client nodes trying to beat up a single node 
> (workloadc, all random-read, with no data returned -p  readallfields=false). 
> There is also a strange phenomenon where if I add a few machines, rather than 
> 3x the YCSB throughput when 3 nodes in cluster, each machine instead is doing 
> about 1/3rd.
> This umbrella issue is to host items that improve our defaults and noting how 
> to get good numbers running YCSB. In particular, I want to be able to 
> saturate a machine.
> Here are the configs I'm currently working with. I've not done the work to 
> figure client-side if they are optimal (weird is how big a difference 
> client-side changes can make -- need to fix this). On my 48 cpu machine, I 
> can do about 370k random reads a second from data totally cached in 
> bucketcache. If I short-circuit the user gets so they don't do any work but 
> return immediately, I can do 600k ops a second but the CPUs are at 60-70% 
> only. I cannot get them to go above this. Working on it.
> {code}
> 
> 
> hbase.ipc.server.read.threadpool.size
> 
> 48
> 
> 
> 
> hbase.regionserver.handler.count
> 
> 100
> 
> 
> 
> hbase.client.ipc.pool.size
> 
> 100
> 
> 
> 
> hbase.htable.threads.max
> 
> 48
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-6103) Optimize the HBaseServer to deserialize the data for each ipc connection in parallel

2016-06-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-6103.
--
Resolution: Not A Problem

Resolving as  not a problem any more. The basic idea was done in HBASE-2941 . 
HBASE-15967 then does the sizing by CPU count that was superior in this issue 
over HBASE-2941.

> Optimize the HBaseServer to deserialize the data for each ipc connection in 
> parallel
> 
>
> Key: HBASE-6103
> URL: https://issues.apache.org/jira/browse/HBASE-6103
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: HBASE-6103-fb-89.patch
>
>
> Currently HBaseServer is running with a single listener thread, which is 
> responsible for accepting the connection, reading the data from network 
> channel, deserializing the data into writable objects and handover to the IPC 
> handler threads. 
> When there are multiple hbase clients connecting to the region server 
> (HBaseServer) and reading/writing a large set of data, the listener and the 
> respond thread will be performance bottleneck. 
> So the solution is to deserialize the data for each ipc connection in 
> parallel for HBaseServer
> BTW, it is also one of the reasons that the parallel scanning from multiple 
> clients is far slower than single client case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15967) Metric for active ipc Readers and make default fraction of cpu count

2016-06-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315997#comment-15315997
 ] 

stack commented on HBASE-15967:
---

HBASE-2941 added the fixed size Readers whereas, a later, alternative 
implementation, HBASE-6103 added count of Readers based off CPUs.

> Metric for active ipc Readers and make default fraction of cpu count
> 
>
> Key: HBASE-15967
> URL: https://issues.apache.org/jira/browse/HBASE-15967
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
> Attachments: HBASE-15967.master.001.patch
>
>
> Our ipc Readers are hard coded at 10 regardless since . Running w/ less 
> Readers, we go faster..(e.g. 12 Readers has us doing 135k with workloadc and 
> 6 readers has us doing 145k).. .but hard to tell what count of Readers are 
> needed since no metric.
> This issue changes Readers to be 1/4 the installed CPUs or 8, whichever is 
> the minimum, and then adds a new hbase.regionserver.ipc.runningReaders metric 
> so you have a chance seeing whats needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15967) Metric for active ipc Readers and make default fraction of cpu count

2016-06-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15967:
--
Attachment: HBASE-15967.master.001.patch

> Metric for active ipc Readers and make default fraction of cpu count
> 
>
> Key: HBASE-15967
> URL: https://issues.apache.org/jira/browse/HBASE-15967
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
> Attachments: HBASE-15967.master.001.patch
>
>
> Our ipc Readers are hard coded at 10 regardless since . Running w/ less 
> Readers, we go faster..(e.g. 12 Readers has us doing 135k with workloadc and 
> 6 readers has us doing 145k).. .but hard to tell what count of Readers are 
> needed since no metric.
> This issue changes Readers to be 1/4 the installed CPUs or 8, whichever is 
> the minimum, and then adds a new hbase.regionserver.ipc.runningReaders metric 
> so you have a chance seeing whats needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15967) Metric for active ipc Readers and make default fraction of cpu count

2016-06-05 Thread stack (JIRA)
stack created HBASE-15967:
-

 Summary: Metric for active ipc Readers and make default fraction 
of cpu count
 Key: HBASE-15967
 URL: https://issues.apache.org/jira/browse/HBASE-15967
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack


Our ipc Readers are hard coded at 10 regardless since . Running w/ less 
Readers, we go faster..(e.g. 12 Readers has us doing 135k with workloadc and 6 
readers has us doing 145k).. .but hard to tell what count of Readers are needed 
since no metric.

This issue changes Readers to be 1/4 the installed CPUs or 8, whichever is the 
minimum, and then adds a new hbase.regionserver.ipc.runningReaders metric so 
you have a chance seeing whats needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-15112) Allow coprocessors to extend 'software attributes' list

2016-06-05 Thread Matt Warhaftig (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Warhaftig reassigned HBASE-15112:
--

Assignee: Matt Warhaftig

> Allow coprocessors to extend 'software attributes' list
> ---
>
> Key: HBASE-15112
> URL: https://issues.apache.org/jira/browse/HBASE-15112
> Project: HBase
>  Issue Type: Improvement
>  Components: Coprocessors
>Reporter: Nick Dimiduk
>Assignee: Matt Warhaftig
>
> Over on the {{/master-status}} and {{/rs-status}} pages we have a list of 
> release properties, giving details about the cluster deployment. We should 
> make this an extension point, allowing coprocessors to register information 
> about themselves as well. For example, Phoenix, Trafodion, Tephra, &c might 
> want to advertise installed version and build details as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9465) HLog entries are not pushed to peer clusters serially when region-move or RS failure in master cluster

2016-06-05 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315781#comment-15315781
 ] 

Phil Yang commented on HBASE-9465:
--

{quote}
Can we use hbase:replication instead of hbase:meta for bookkeeping ?
{quote}
If I am not wrong, HBASE-15583 and HBASE-15867 are using system table to 
replace ZK implementation, hbase:replication is used for tracking tasks and 
queues whose rowkey is not region name. But in this issue we need a table/cf to 
save some region level information, and it is convenient to use hbase:meta to 
do this because we can merge the updating request and updating region status to 
one atomic operation in one table and one row when we open/merge/split a 
region, with no extra executing time. 

And this issue can also works well in ZK implementation and can be ported to 
1.x branches even 0.98 (Our production branch is based on 0.98), if we rely on 
hbase:replication's implementatio, this patch can only works on the version 
fixed by HBASE-15583 

> HLog entries are not pushed to peer clusters serially when region-move or RS 
> failure in master cluster
> --
>
> Key: HBASE-9465
> URL: https://issues.apache.org/jira/browse/HBASE-9465
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, Replication
>Reporter: Honghua Feng
>Assignee: Phil Yang
>
> When region-move or RS failure occurs in master cluster, the hlog entries 
> that are not pushed before region-move or RS-failure will be pushed by 
> original RS(for region move) or another RS which takes over the remained hlog 
> of dead RS(for RS failure), and the new entries for the same region(s) will 
> be pushed by the RS which now serves the region(s), but they push the hlog 
> entries of a same region concurrently without coordination.
> This treatment can possibly lead to data inconsistency between master and 
> peer clusters:
> 1. there are put and then delete written to master cluster
> 2. due to region-move / RS-failure, they are pushed by different 
> replication-source threads to peer cluster
> 3. if delete is pushed to peer cluster before put, and flush and 
> major-compact occurs in peer cluster before put is pushed to peer cluster, 
> the delete is collected and the put remains in peer cluster
> In this scenario, the put remains in peer cluster, but in master cluster the 
> put is masked by the delete, hence data inconsistency between master and peer 
> clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)