[jira] [Commented] (HBASE-21034) Add new throttle type: read/write capacity unit

2019-01-17 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745755#comment-16745755
 ] 

Esteban Gutierrez commented on HBASE-21034:
---

I'm -1 to have this new feature in a maintenance release. I think the right 
approach should be to revert it. it won't be a good precedent to let this go 
thru as [~sershe] said.

> Add new throttle type: read/write capacity unit
> ---
>
> Key: HBASE-21034
> URL: https://issues.apache.org/jira/browse/HBASE-21034
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21034.branch-2.0.001.patch, 
> HBASE-21034.branch-2.0.001.patch, HBASE-21034.branch-2.1.001.patch, 
> HBASE-21034.branch-2.1.001.patch, HBASE-21034.master.001.patch, 
> HBASE-21034.master.002.patch, HBASE-21034.master.003.patch, 
> HBASE-21034.master.004.patch, HBASE-21034.master.005.patch, 
> HBASE-21034.master.006.patch, HBASE-21034.master.006.patch, 
> HBASE-21034.master.007.patch, HBASE-21034.master.007.patch
>
>
> Add new throttle type: read/write capacity unit like DynamoDB.
> One read capacity unit represents that read up to 1K data per time unit. If 
> data size is more than 1K, then consume additional read capacity units.
> One write capacity unit represents that one write for an item up to 1 KB in 
> size per time unit. If data size is more than 1K, then consume additional 
> write capacity units.
> For example, 100 read capacity units per second means that, HBase user can 
> read 100 times for 1K data in every second, or 50 times for 2K data in every 
> second and so on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21034) Add new throttle type: read/write capacity unit

2019-01-18 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746427#comment-16746427
 ] 

Esteban Gutierrez commented on HBASE-21034:
---

[~zghaobac]:
{quote}
This feature is small. And it was backported to branch-1. If we don't backport 
this to branch-2.1 and a user use this feature in 1.x version, so can't rolling 
upgrade to 2.1.* version?
{quote}
That can obviously be the case too while performing a rolling upgrading to a 
previous maintenance release from branch-2.1 and thats why is important to 
avoid this kind of things to happen. Even if this is a small feature as few 
have mentioned here, it adds few changes to our pubf specs and I think thats 
quite a stretch in a maintenance release. 


> Add new throttle type: read/write capacity unit
> ---
>
> Key: HBASE-21034
> URL: https://issues.apache.org/jira/browse/HBASE-21034
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21034.branch-2.0.001.patch, 
> HBASE-21034.branch-2.0.001.patch, HBASE-21034.branch-2.1.001.patch, 
> HBASE-21034.branch-2.1.001.patch, HBASE-21034.master.001.patch, 
> HBASE-21034.master.002.patch, HBASE-21034.master.003.patch, 
> HBASE-21034.master.004.patch, HBASE-21034.master.005.patch, 
> HBASE-21034.master.006.patch, HBASE-21034.master.006.patch, 
> HBASE-21034.master.007.patch, HBASE-21034.master.007.patch
>
>
> Add new throttle type: read/write capacity unit like DynamoDB.
> One read capacity unit represents that read up to 1K data per time unit. If 
> data size is more than 1K, then consume additional read capacity units.
> One write capacity unit represents that one write for an item up to 1 KB in 
> size per time unit. If data size is more than 1K, then consume additional 
> write capacity units.
> For example, 100 read capacity units per second means that, HBase user can 
> read 100 times for 1K data in every second, or 50 times for 2K data in every 
> second and so on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-22253:
-

 Summary: An AuthenticationTokenSecretManager leader won't step 
down if another RS claims to be a leader
 Key: HBASE-22253
 URL: https://issues.apache.org/jira/browse/HBASE-22253
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 2.1.0, 3.0.0, 2.2.0
Reporter: Esteban Gutierrez


We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rouge leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-22253:
--
Description: 
We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rogue leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.


  was:
We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rouge leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.



> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rouge Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-22253:
--
Description: 
We ran into a situation were a rogue Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rogue leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.


  was:
We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rogue leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.



> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rogue Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-22253:
-

Assignee: Esteban Gutierrez

> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rogue Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819413#comment-16819413
 ] 

Esteban Gutierrez commented on HBASE-22253:
---

bq. related: if we are leader and the leader znode is deleted we should step 
down
Yeah, probably we should make sure that the session timeout for the keymaster 
znode is shorter than the sleep interval for the LeaderElector.

> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rogue Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22263) Master creates duplicate ServerCrashProcedure on initialization, leading to assignment hanging in region-dense clusters

2019-04-17 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820478#comment-16820478
 ] 

Esteban Gutierrez commented on HBASE-22263:
---

bq. What we did to recover in our case was set the namespace init timeout very 
high, removed the master proc wal, and then brought up a master and waited 
until it cleared things out and came up.

> Master creates duplicate ServerCrashProcedure on initialization, leading to 
> assignment hanging in region-dense clusters
> ---
>
> Key: HBASE-22263
> URL: https://issues.apache.org/jira/browse/HBASE-22263
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> h3. Problem:
> During Master initialization we
>  # restore existing procedures that still need to run from prior active 
> Master instances
>  # look for signs that Region Servers have died and need to be recovered 
> while we were out and schedule a ServerCrashProcedure (SCP) for each them
>  # turn on the assignment manager
> The normal turn of events for a ServerCrashProcedure will attempt to use a 
> bulk assignment to maintain the set of regions on a RS if possible. However, 
> we wait around and retry a bit later if the assignment manager isn’t ready 
> yet.
> Note that currently #2 has no notion of wether or not a previous active 
> Master instances has already done a check. This means we might schedule an 
> SCP for a ServerName (host, port, start code) that already has an SCP 
> scheduled. Ideally, such a duplicate should be a no-op.
> However, before step #2 schedules the SCP it first marks the region server as 
> dead and not yet processed, with the expectation that the SCP it just created 
> will look if there is log splitting work and then mark the server as easy for 
> region assignment. At the same time, any restored SCPs that are past the step 
> of log splitting will be waiting for the AssignmentManager still. As a part 
> of restoring themselves, they do not update with the current master instance 
> to show that they are past the point of WAL processing.
> Once the AssignmentManager starts in #3 the restored SCP continues; it will 
> eventually get to the assignment phase and find that its server is marked as 
> dead and in need of wal processing. Such assignments are skipped with a log 
> message. Thus as we iterate over the regions to assign we’ll skip all of 
> them. This non-intuitively shifts the “no-op” status from the newer SCP we 
> scheduled at #2 to the older SCP that was restored in #1.
> Bulk assignment works by sending the assign calls via a pool to allow more 
> parallelism. Once we’ve set up the pool we just wait to see if the region 
> state updates to online. Unfortunately, since all of the assigns got skipped, 
> we’ll never change the state for any of these regions. That means the bulk 
> assign, and the older SCP that started it, will wait until it hits a timeout.
> By default the timeout for a bulk assignment is the smaller of {{(# Regions 
> in the plan * 10s)}} or {{(# Regions in the most loaded RS in the plan * 1s + 
> 60s + # of RegionServers in the cluster * 30s)}}. For even modest clusters 
> with several hundreds of regions per region server, this means the “no-op” 
> SCP will end up waiting ~tens-of-minutes (e.g. ~50 minutes for an average 
> region density of 300 regions per region server on a 100 node cluster. ~11 
> minutes for 300 regions per region server on a 10 node cluster). During this 
> time, the SCP will hold one of the available procedure execution slots for 
> both the overall pool and for the specific server queue.
> As previously mentioned, restored SCPs will retry their submission if the 
> assignment manager has not yet been activated (done in #3), this can cause 
> them to be scheduled after the newer SCPs (created in #2). Thus the order of 
> execution of no-op and usable SCPs can vary from run-to-run of master 
> initialization.
> This means that unless you get lucky with SCP ordering, impacted regions will 
> remain as RIT for an extended period of time. If you get particularly unlucky 
> and a critical system table is included in the regions that are being 
> recovered, then master initialization itself will end up blocked on this 
> sequence of SCP timeouts. If there are enough of them to exceed the master 
> initialization timeouts, then the situation can be self-sustaining as 
> additional master fails over cause even more duplicative SCPs to be scheduled.
> h3. Indicators:
>  * Master appears to hang; failing to assign regions to available region 
> servers.
>  * Master

[jira] [Comment Edited] (HBASE-22263) Master creates duplicate ServerCrashProcedure on initialization, leading to assignment hanging in region-dense clusters

2019-04-17 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820478#comment-16820478
 ] 

Esteban Gutierrez edited comment on HBASE-22263 at 4/17/19 8:27 PM:


[~apurtell]:
bq. What we did to recover in our case was set the namespace init timeout very 
high, removed the master proc wal, and then brought up a master and waited 
until it cleared things out and came up.
Thats exactly the same approach I tried, but due the urgency of getting the 
cluster online and seeing the region assignment stagnate, we had to go via a 
different route, e.g. failing over multiple times. 


was (Author: esteban):
bq. What we did to recover in our case was set the namespace init timeout very 
high, removed the master proc wal, and then brought up a master and waited 
until it cleared things out and came up.

> Master creates duplicate ServerCrashProcedure on initialization, leading to 
> assignment hanging in region-dense clusters
> ---
>
> Key: HBASE-22263
> URL: https://issues.apache.org/jira/browse/HBASE-22263
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> h3. Problem:
> During Master initialization we
>  # restore existing procedures that still need to run from prior active 
> Master instances
>  # look for signs that Region Servers have died and need to be recovered 
> while we were out and schedule a ServerCrashProcedure (SCP) for each them
>  # turn on the assignment manager
> The normal turn of events for a ServerCrashProcedure will attempt to use a 
> bulk assignment to maintain the set of regions on a RS if possible. However, 
> we wait around and retry a bit later if the assignment manager isn’t ready 
> yet.
> Note that currently #2 has no notion of wether or not a previous active 
> Master instances has already done a check. This means we might schedule an 
> SCP for a ServerName (host, port, start code) that already has an SCP 
> scheduled. Ideally, such a duplicate should be a no-op.
> However, before step #2 schedules the SCP it first marks the region server as 
> dead and not yet processed, with the expectation that the SCP it just created 
> will look if there is log splitting work and then mark the server as easy for 
> region assignment. At the same time, any restored SCPs that are past the step 
> of log splitting will be waiting for the AssignmentManager still. As a part 
> of restoring themselves, they do not update with the current master instance 
> to show that they are past the point of WAL processing.
> Once the AssignmentManager starts in #3 the restored SCP continues; it will 
> eventually get to the assignment phase and find that its server is marked as 
> dead and in need of wal processing. Such assignments are skipped with a log 
> message. Thus as we iterate over the regions to assign we’ll skip all of 
> them. This non-intuitively shifts the “no-op” status from the newer SCP we 
> scheduled at #2 to the older SCP that was restored in #1.
> Bulk assignment works by sending the assign calls via a pool to allow more 
> parallelism. Once we’ve set up the pool we just wait to see if the region 
> state updates to online. Unfortunately, since all of the assigns got skipped, 
> we’ll never change the state for any of these regions. That means the bulk 
> assign, and the older SCP that started it, will wait until it hits a timeout.
> By default the timeout for a bulk assignment is the smaller of {{(# Regions 
> in the plan * 10s)}} or {{(# Regions in the most loaded RS in the plan * 1s + 
> 60s + # of RegionServers in the cluster * 30s)}}. For even modest clusters 
> with several hundreds of regions per region server, this means the “no-op” 
> SCP will end up waiting ~tens-of-minutes (e.g. ~50 minutes for an average 
> region density of 300 regions per region server on a 100 node cluster. ~11 
> minutes for 300 regions per region server on a 10 node cluster). During this 
> time, the SCP will hold one of the available procedure execution slots for 
> both the overall pool and for the specific server queue.
> As previously mentioned, restored SCPs will retry their submission if the 
> assignment manager has not yet been activated (done in #3), this can cause 
> them to be scheduled after the newer SCPs (created in #2). Thus the order of 
> execution of no-op and usable SCPs can vary from run-to-run of master 
> initialization.
> This means that unless you get lucky with SCP ordering, impacted regions will 
> remain as RIT for an extended period of time. If you get particularly unlucky 
> and 

[jira] [Commented] (HBASE-22286) License handling incorrectly lists CDDL/GPLv2+CE as safe to not aggregate

2019-04-22 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823342#comment-16823342
 ] 

Esteban Gutierrez commented on HBASE-22286:
---

+1

> License handling incorrectly lists CDDL/GPLv2+CE as safe to not aggregate
> -
>
> Key: HBASE-22286
> URL: https://issues.apache.org/jira/browse/HBASE-22286
> Project: HBase
>  Issue Type: Bug
>  Components: build, community
>Affects Versions: 3.0.0, 2.3.0, 2.1.5, 2.2.1
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HBASE-22286.0.patch
>
>
> The template LICENSE/NOTICE stuff currently has cddl/gplv2+ce listed as an 
> acceptable license for dependencies for individual listing.
> LICENSE.vm
> {code}
> ## Whitelist of lower-case licenses that it's safe to not aggregate as above.
> ## Note that this doesn't include ALv2 or the aforementioned aggregate
> ## license mentions.
> ##
> ## See this FAQ link for justifications: 
> https://www.apache.org/legal/resolved.html
> ##
> ## NB: This list is later compared as lower-case. New entries must also be 
> all lower-case
> #set($non_aggregate_fine = [ 'public domain', 'new bsd license', 'bsd 
> license', 'bsd', 'bsd 2-clause license', 'mozilla public license version 
> 1.1', 'mozilla public license version 2.0', 'creative commons attribution 
> license, version 2.5', 'cddl/gplv2+ce' ])
> {code}
> This is not correct. We have to expressly say we're using the CDDL license 
> for those works because we can't provide downstream with the option of 
> GPLv2+CE. Also we have aggregate licensing handling for CDDL licensed works 
> and this is making us miss times when dependencies are supposed to show up 
> under one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19994) Create a new class for RPC throttling exception, make it retryable.

2018-04-06 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428913#comment-16428913
 ] 

Esteban Gutierrez commented on HBASE-19994:
---

+1 but can we add release notes about the new exception and how this will 
impact clients during a rolling restart of HBase where quotas are being used? 
Thanks!

> Create a new class for RPC throttling exception, make it retryable. 
> 
>
> Key: HBASE-19994
> URL: https://issues.apache.org/jira/browse/HBASE-19994
> Project: HBase
>  Issue Type: Improvement
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Major
> Attachments: HBASE-19994-master-v01.patch, 
> HBASE-19994-master-v02.patch, HBASE-19994-master-v03.patch, 
> HBASE-19994-master-v04.patch, HBASE-19994-master-v05.patch, 
> HBASE-19994-master-v06.patch, HBASE-19994-master-v07.patch
>
>
> Based on a discussion at dev mailing list.
>  
> {code:java}
> Thanks Andrew.
> +1 for the second option, I will create a jira for this change.
> Huaxiang
> On Feb 9, 2018, at 1:09 PM, Andrew Purtell  wrote:
> We have
> public class ThrottlingException extends QuotaExceededException
> public class QuotaExceededException extends DoNotRetryIOException
> Let the storage quota limits throw QuotaExceededException directly (based
> on DNRIOE). That seems fine.
> However, ThrottlingException is thrown as a result of a temporal quota,
> so it is inappropriate for this to inherit from DNRIOE, it should inherit
> IOException instead so the client is allowed to retry until successful, or
> until the retry policy is exhausted.
> We are in a bit of a pickle because we've released with this inheritance
> hierarchy, so to change it we will need a new minor, or we will want to
> deprecate ThrottlingException and use a new exception class instead, one
> which does not inherit from DNRIOE.
> On Feb 7, 2018, at 9:25 AM, Huaxiang Sun  wrote:
> Hi Mike,
>   You are right. For rpc throttling, definitely it is retryable. For storage 
> quota, I think it will be fail faster (non-retryable).
>   We probably need to separate these two types of exceptions, I will do some 
> more research and follow up.
>   Thanks,
>   Huaxiang
> On Feb 7, 2018, at 9:16 AM, Mike Drob  wrote:
> I think, philosophically, there can be two kinds of QEE -
> For throttling, we can retry. The quota is a temporal quota - you have done
> too many operations this minute, please try again next minute and
> everything will work.
> For storage, we shouldn't retry. The quota is a fixed quote - you have
> exceeded your allotted disk space, please do not try again until you have
> remedied the situation.
> Our current usage conflates the two, sometimes it is correct, sometimes not.
> On Wed, Feb 7, 2018 at 11:00 AM, Huaxiang Sun  wrote:
> Hi Stack,
>  I run into a case that a mapreduce job in hive cannot finish because
> it runs into a QEE.
> I need to look into the hive mr task to see if QEE is not handled
> correctly in hbase code or in hive code.
> I am thinking that if  QEE is a retryable exception, then it should be
> taken care of by the hbase code.
> I will check more and report back.
> Thanks,
> Huaxiang
> On Feb 7, 2018, at 8:23 AM, Stack  wrote:
> QEE being a DNRIOE seems right on the face of it.
> But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a
> throttling scenario Huaxiang?
> Thanks,
> S
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474644#comment-16474644
 ] 

Esteban Gutierrez commented on HBASE-19572:
---

lgtm [~brfrn169]. Will upload again, just to make sure it stills apply to 
master.

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19572:
--
Attachment: (was: HBASE-19572.master.003.patch.txt)

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19572:
--
Attachment: HBASE-19572.master.003.patch

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19572:
--
Attachment: HBASE-19572.master.003.patch.txt

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-20604:
-

 Summary: ProtobufLogReader#readNext can incorrectly loop to the 
same position in the stream until the the WAL is rolled
 Key: HBASE-20604
 URL: https://issues.apache.org/jira/browse/HBASE-20604
 Project: HBase
  Issue Type: Bug
  Components: Replication, wal
Affects Versions: 3.0.0, 2.1.0, 1.5.0
Reporter: Esteban Gutierrez


Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
associated to the {{FSDataInputStream}} from the WAL that we are reading. Under 
certain conditions, e.g. when using the encryption at rest 
({{CryptoInputStream}}) the stream can return partial data which can cause a 
premature EOF that cause {{inputStream.getPos()}} to return to the same origina 
position causing {{ProtobufLogReader#readNext}} to re-try over the reads until 
the WAL is rolled.

The side effect of this issue is that {{ReplicationSource}} can get stuck until 
the WAL is rolled and causing replication delays up to an hour in some cases.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-20604:
-

Assignee: Esteban Gutierrez

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Affects Version/s: (was: 1.5.0)
   (was: 2.1.0)
   Status: Patch Available  (was: Open)

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Attachment: HBASE-20604.patch

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-22 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Attachment: HBASE-20604.002.patch

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-22 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484522#comment-16484522
 ] 

Esteban Gutierrez commented on HBASE-20604:
---

Thanks [~Apache9]. I'm looking into injecting a failure in 
{{ProtobufUtil.mergeFrom()}} or maybe directly into {{FSDataInputStream}} in 
order to have more accurate test. 

Attaching new patch that additionally does a seek back to the original position 
of the stream when no KVs are present so an additional read to the stream 
shouldn't trigger an unnecessary EOFException.


> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-24 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489162#comment-16489162
 ] 

Esteban Gutierrez commented on HBASE-19572:
---

Thanks [~brfrn169]! I will go ahead and commit shortly.


> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, 
> HBASE-19572.master.004.patch, HBASE-19572.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-11625) Reading datablock throws "Invalid HFile block magic" and can not switch to hdfs checksum

2018-05-29 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-11625:
-

Assignee: Esteban Gutierrez  (was: Appy)

> Reading datablock throws "Invalid HFile block magic" and can not switch to 
> hdfs checksum 
> -
>
> Key: HBASE-11625
> URL: https://issues.apache.org/jira/browse/HBASE-11625
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.94.21, 0.98.4, 0.98.5, 1.0.1.1, 1.0.3
>Reporter: qian wang
>Assignee: Esteban Gutierrez
>Priority: Major
> Fix For: 1.3.0, 1.2.2, 1.1.6, 2.0.0
>
> Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz, 
> HBASE-11625-branch-1-v1.patch, HBASE-11625-branch-1.2-v1.patch, 
> HBASE-11625-branch-1.2-v2.patch, HBASE-11625-branch-1.2-v3.patch, 
> HBASE-11625-branch-1.2-v4.patch, HBASE-11625-master-v2.patch, 
> HBASE-11625-master-v3.patch, HBASE-11625-master.patch, 
> HBASE-11625.branch-1.1.001.patch, HBASE-11625.patch, correct-hfile, 
> corrupted-header-hfile
>
>
> when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
> could happen file corruption but it only can switch to hdfs checksum 
> inputstream till validateBlockChecksum(). If the datablock's header corrupted 
> when b = new HFileBlock(),it throws the exception "Invalid HFile block magic" 
> and the rpc call fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-11625) Reading datablock throws "Invalid HFile block magic" and can not switch to hdfs checksum

2018-05-29 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-11625:
-

Assignee: Appy  (was: Esteban Gutierrez)

> Reading datablock throws "Invalid HFile block magic" and can not switch to 
> hdfs checksum 
> -
>
> Key: HBASE-11625
> URL: https://issues.apache.org/jira/browse/HBASE-11625
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.94.21, 0.98.4, 0.98.5, 1.0.1.1, 1.0.3
>Reporter: qian wang
>Assignee: Appy
>Priority: Major
> Fix For: 1.3.0, 1.2.2, 1.1.6, 2.0.0
>
> Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz, 
> HBASE-11625-branch-1-v1.patch, HBASE-11625-branch-1.2-v1.patch, 
> HBASE-11625-branch-1.2-v2.patch, HBASE-11625-branch-1.2-v3.patch, 
> HBASE-11625-branch-1.2-v4.patch, HBASE-11625-master-v2.patch, 
> HBASE-11625-master-v3.patch, HBASE-11625-master.patch, 
> HBASE-11625.branch-1.1.001.patch, HBASE-11625.patch, correct-hfile, 
> corrupted-header-hfile
>
>
> when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
> could happen file corruption but it only can switch to hdfs checksum 
> inputstream till validateBlockChecksum(). If the datablock's header corrupted 
> when b = new HFileBlock(),it throws the exception "Invalid HFile block magic" 
> and the rpc call fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2020-09-03 Thread Esteban Gutierrez (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-19352 started by Esteban Gutierrez.
-
> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2020-09-03 Thread Esteban Gutierrez (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-19352.
---
Fix Version/s: 2.2.6
   2.4.0
   2.3.3
   3.0.0-alpha-1
 Tags: security
   Resolution: Fixed

> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.6
>
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24041) [regression] Increase RESTServer buffer size back to 64k

2020-03-24 Thread Esteban Gutierrez (Jira)
Esteban Gutierrez created HBASE-24041:
-

 Summary: [regression]  Increase RESTServer buffer size back to 64k
 Key: HBASE-24041
 URL: https://issues.apache.org/jira/browse/HBASE-24041
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0, 2.3.0, 2.4.0
Reporter: Esteban Gutierrez


HBASE-14492 is not longer present in our current releases after HBASE-12894. 
Unfortunately our RESTServer is not extending HttpServer which means that 
{{DEFAULT_MAX_HEADER_SIZE}} is not being set and HTTP requests with a very 
large header can still cause connection issues for clients. A quick fix is just 
to add the settings to the {{HttpConfiguration}} configuration object. A long 
term solution should be to re-factor services that create an HTTP server and 
normalize all configuration settings across all of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24041) [regression] Increase RESTServer buffer size back to 64k

2020-03-27 Thread Esteban Gutierrez (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-24041.
---
Fix Version/s: 2.2.5
   2.4.0
   2.3.0
   3.0.0
   Resolution: Fixed

> [regression]  Increase RESTServer buffer size back to 64k
> -
>
> Key: HBASE-24041
> URL: https://issues.apache.org/jira/browse/HBASE-24041
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 3.0.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Esteban Gutierrez
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.4.0, 2.2.5
>
>
> HBASE-14492 is not longer present in our current releases after HBASE-12894. 
> Unfortunately our RESTServer is not extending HttpServer which means that 
> {{DEFAULT_MAX_HEADER_SIZE}} is not being set and HTTP requests with a very 
> large header can still cause connection issues for clients. A quick fix is 
> just to add the settings to the {{HttpConfiguration}} configuration object. A 
> long term solution should be to re-factor services that create an HTTP server 
> and normalize all configuration settings across all of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-17007) Move ZooKeeper logging to its own log file

2016-11-03 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634259#comment-15634259
 ] 

Esteban Gutierrez commented on HBASE-17007:
---

[~enis] I think what [~stack] said is pretty much one of the main problems with 
the current logging that we have in the RSs. I think it doesn't hurt doing this 
change and helps to reduce the clutter specially during startup of the RS.

> Move ZooKeeper logging to its own log file
> --
>
> Key: HBASE-17007
> URL: https://issues.apache.org/jira/browse/HBASE-17007
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Trivial
> Attachments: 
> 0001-HBASE-17007-Move-ZooKeeper-logging-to-its-own-log-fi.patch
>
>
> ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a 
> different log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17007) Move ZooKeeper logging to its own log file

2016-11-03 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634294#comment-15634294
 ] 

Esteban Gutierrez commented on HBASE-17007:
---

[~enis] I think what [~stack] said is pretty much one of the main problems with 
the current logging that we have in the RSs. I think it doesn't hurt doing this 
change and helps to reduce the clutter specially during startup of the RS.

> Move ZooKeeper logging to its own log file
> --
>
> Key: HBASE-17007
> URL: https://issues.apache.org/jira/browse/HBASE-17007
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Trivial
> Attachments: 
> 0001-HBASE-17007-Move-ZooKeeper-logging-to-its-own-log-fi.patch
>
>
> ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a 
> different log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17007) Move ZooKeeper logging to its own log file

2016-11-04 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637883#comment-15637883
 ] 

Esteban Gutierrez commented on HBASE-17007:
---

We thought about removing only the classpath initially but that requires to 
patch ZooKeeper to change the client logging level for ZK.  Also ZooKeeper is 
used by some coprocessors like Tephra and Phoenix and logs get polluted quite 
easily due other tasks done by those CPs. There is another alternative and 
that's removing the duplicated classpath from the logs by adding CLASSPATH  to 
the list of skipwords in ServerCommandLine but usually the CLASSPATH 
environment string is shorter than java.class.path as reported by the jvm which 
is what ZK si dumping. In a quick test the whole line with java.class.path is 
63076 bytes long vs 14293 bytes long for the string that contains the CLASSPATH.

> Move ZooKeeper logging to its own log file
> --
>
> Key: HBASE-17007
> URL: https://issues.apache.org/jira/browse/HBASE-17007
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Trivial
> Attachments: 
> 0001-HBASE-17007-Move-ZooKeeper-logging-to-its-own-log-fi.patch
>
>
> ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a 
> different log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15324) Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split

2016-11-08 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649100#comment-15649100
 ] 

Esteban Gutierrez commented on HBASE-15324:
---

We ran into this in 1.2 with a customer and that caused 10s of thousands of new 
regions to be created in matter of hours. I'm going to push it from 0.98 to 1.2

> Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy 
> and trigger unexpected split
> --
>
> Key: HBASE-15324
> URL: https://issues.apache.org/jira/browse/HBASE-15324
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.1.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-15324.patch, HBASE-15324_v2.patch, 
> HBASE-15324_v3.patch, HBASE-15324_v3.patch
>
>
> We introduce jitter for region split decision in HBASE-13412, but the 
> following line in {{ConstantSizeRegionSplitPolicy}} may cause long value 
> overflow if MAX_FILESIZE is specified to Long.MAX_VALUE:
> {code}
> this.desiredMaxFileSize += (long)(desiredMaxFileSize * (RANDOM.nextFloat() - 
> 0.5D) * jitter);
> {code}
> In our case we specify MAX_FILESIZE to Long.MAX_VALUE to prevent target 
> region to split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13412) Region split decisions should have jitter

2016-11-08 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-13412:
--
Fix Version/s: 1.2.5

> Region split decisions should have jitter
> -
>
> Key: HBASE-13412
> URL: https://issues.apache.org/jira/browse/HBASE-13412
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.1.0, 0.98.13
>
> Attachments: HBASE-13412-v1.patch, HBASE-13412-v2.patch, 
> HBASE-13412-v3.patch, HBASE-13412.addendum.0.98-2.patch, 
> HBASE-13412.addendum.0.98.patch, HBASE-13412.patch, hbase-13412.addendum.patch
>
>
> Whenever a region splits it causes lots of IO (compactions are queued for a 
> while). Because of this it's important to make sure that well distributed 
> tables don't have all of their regions split at exactly the same time.
> This is basically the same as our compaction jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13412) Region split decisions should have jitter

2016-11-08 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-13412:
--
Fix Version/s: (was: 1.2.5)

> Region split decisions should have jitter
> -
>
> Key: HBASE-13412
> URL: https://issues.apache.org/jira/browse/HBASE-13412
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.1.0, 0.98.13
>
> Attachments: HBASE-13412-v1.patch, HBASE-13412-v2.patch, 
> HBASE-13412-v3.patch, HBASE-13412.addendum.0.98-2.patch, 
> HBASE-13412.addendum.0.98.patch, HBASE-13412.patch, hbase-13412.addendum.patch
>
>
> Whenever a region splits it causes lots of IO (compactions are queued for a 
> while). Because of this it's important to make sure that well distributed 
> tables don't have all of their regions split at exactly the same time.
> This is basically the same as our compaction jitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15324) Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split

2016-11-08 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-15324:
--
Fix Version/s: 1.2.5

> Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy 
> and trigger unexpected split
> --
>
> Key: HBASE-15324
> URL: https://issues.apache.org/jira/browse/HBASE-15324
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.1.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5
>
> Attachments: HBASE-15324.patch, HBASE-15324_v2.patch, 
> HBASE-15324_v3.patch, HBASE-15324_v3.patch
>
>
> We introduce jitter for region split decision in HBASE-13412, but the 
> following line in {{ConstantSizeRegionSplitPolicy}} may cause long value 
> overflow if MAX_FILESIZE is specified to Long.MAX_VALUE:
> {code}
> this.desiredMaxFileSize += (long)(desiredMaxFileSize * (RANDOM.nextFloat() - 
> 0.5D) * jitter);
> {code}
> In our case we specify MAX_FILESIZE to Long.MAX_VALUE to prevent target 
> region to split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15324) Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split

2016-11-08 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-15324:
--
Fix Version/s: 1.1.8

> Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy 
> and trigger unexpected split
> --
>
> Key: HBASE-15324
> URL: https://issues.apache.org/jira/browse/HBASE-15324
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.1.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 1.1.8
>
> Attachments: HBASE-15324.patch, HBASE-15324_v2.patch, 
> HBASE-15324_v3.patch, HBASE-15324_v3.patch
>
>
> We introduce jitter for region split decision in HBASE-13412, but the 
> following line in {{ConstantSizeRegionSplitPolicy}} may cause long value 
> overflow if MAX_FILESIZE is specified to Long.MAX_VALUE:
> {code}
> this.desiredMaxFileSize += (long)(desiredMaxFileSize * (RANDOM.nextFloat() - 
> 0.5D) * jitter);
> {code}
> In our case we specify MAX_FILESIZE to Long.MAX_VALUE to prevent target 
> region to split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15324) Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split

2016-11-08 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649258#comment-15649258
 ] 

Esteban Gutierrez commented on HBASE-15324:
---

Not pushing to 0.98 since the jitter added by HBASE-13412 is not enabled by 
default. 

> Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy 
> and trigger unexpected split
> --
>
> Key: HBASE-15324
> URL: https://issues.apache.org/jira/browse/HBASE-15324
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.1.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 1.1.8
>
> Attachments: HBASE-15324.patch, HBASE-15324_v2.patch, 
> HBASE-15324_v3.patch, HBASE-15324_v3.patch
>
>
> We introduce jitter for region split decision in HBASE-13412, but the 
> following line in {{ConstantSizeRegionSplitPolicy}} may cause long value 
> overflow if MAX_FILESIZE is specified to Long.MAX_VALUE:
> {code}
> this.desiredMaxFileSize += (long)(desiredMaxFileSize * (RANDOM.nextFloat() - 
> 0.5D) * jitter);
> {code}
> In our case we specify MAX_FILESIZE to Long.MAX_VALUE to prevent target 
> region to split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324

2016-11-09 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-17058:
-

 Summary: Lower epsilon used for jitter verification from 
HBASE-15324
 Key: HBASE-17058
 URL: https://issues.apache.org/jira/browse/HBASE-17058
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 1.2.4, 1.1.7, 2.0.0, 1.3.0, 1.4.0
Reporter: Esteban Gutierrez


The current epsilon used is 1E-6 and its too big it might overflow the 
desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 
2^-53. An option to consider too is just to shift the jitter to always 
decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the 
size of the region and having to deal with the round off.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15324) Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split

2016-11-09 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652810#comment-15652810
 ] 

Esteban Gutierrez commented on HBASE-15324:
---

[~huaxiang] I think the problem is the value of the epsilon used for the 
precision of the types involved (float x double).  I think it should be at 
least 2.22e-16 (2^-52) or even 1.11e-16 (2^-53).  Created HBASE-17058 for 
follow up. Thanks.

> Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy 
> and trigger unexpected split
> --
>
> Key: HBASE-15324
> URL: https://issues.apache.org/jira/browse/HBASE-15324
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.1.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 1.1.8
>
> Attachments: HBASE-15324.patch, HBASE-15324_v2.patch, 
> HBASE-15324_v3.patch, HBASE-15324_v3.patch
>
>
> We introduce jitter for region split decision in HBASE-13412, but the 
> following line in {{ConstantSizeRegionSplitPolicy}} may cause long value 
> overflow if MAX_FILESIZE is specified to Long.MAX_VALUE:
> {code}
> this.desiredMaxFileSize += (long)(desiredMaxFileSize * (RANDOM.nextFloat() - 
> 0.5D) * jitter);
> {code}
> In our case we specify MAX_FILESIZE to Long.MAX_VALUE to prevent target 
> region to split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-10 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656368#comment-15656368
 ] 

Esteban Gutierrez commented on HBASE-17072:
---

As I think about this problem as described by [~sato_eiichi] I think there is 
some value of disabling optionally the prefetching of headers for some 
workloads (lots of regions, very large HFiles, SSDs) and it could be done via 
an HCD like PREFETCH_BLOCKS_ON_OPEN. However, regarding of the CPU usage I 
think the counters and our metrics per region are very expensive in general.

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: disable-block-header-cache.patch, mat-threadlocals.png, 
> mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-10 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656372#comment-15656372
 ] 

Esteban Gutierrez commented on HBASE-17072:
---

[~sato_eiichi], take a look at HBASE-17017 probably removing the per-region 
metrics could help.

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: disable-block-header-cache.patch, mat-threadlocals.png, 
> mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC

2016-11-10 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656380#comment-15656380
 ] 

Esteban Gutierrez commented on HBASE-17072:
---

[~anoopsamjohn], yeah Stack fixed the creation of new instances but we still 
pre fetch one header. With 1000s of regions I think you can still have lots if 
instances around.

> CPU usage starts to climb up to 90-100% when using G1GC
> ---
>
> Key: HBASE-17072
> URL: https://issues.apache.org/jira/browse/HBASE-17072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, regionserver
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Eiichi Sato
> Attachments: disable-block-header-cache.patch, mat-threadlocals.png, 
> mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg
>
>
> h5. Problem
> CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts 
> to gradually get higher up to nearly 90-100% when using G1GC.  We've also run 
> into this problem on CDH 5.7.3 and CDH 5.8.2.
> In our production cluster, it normally takes a few weeks for this to happen 
> after restarting a RS.  We reproduced this on our test cluster and attached 
> the results.  Please note that, to make it easy to reproduce, we did some 
> "anti-tuning" on a table when running tests.
> In metrics.png, soon after we started running some workloads against a test 
> cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. 
>  Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of 
> each RS process around 10:30 a.m. the next day.
> After investigating heapdumps from another occurrence on a test cluster 
> running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of 
> contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary 
> clustering.  This caused more loops in 
> {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU 
> time.  What is worse is that the method is called from RPC metrics code, 
> which means even a small amount of per-RPC time soon adds up to a huge amount 
> of CPU time.
> This is very similar to the issue in HBASE-16616, but we have many 
> {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances.  
> Here are some OQL counts from Eclipse Memory Analyzer (MAT).  This shows a 
> number of ThreadLocal instances in the ThreadLocalMap of a single handler 
> thread.
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = 
> "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader"
> #=> 10980 instances
> {code}
> {code}
> SELECT *
> FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value
> FROM OBJECTS 0x4ee380430) obj
> WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder"
> #=> 2052 instances
> {code}
> Although as described in HBASE-16616 this somewhat seems to be an issue in 
> G1GC side regarding weakly-reachable objects, we should keep ThreadLocal 
> usage minimal and avoid creating an indefinite number (in this case, a number 
> of HFiles) of ThreadLocal instances.
> HBASE-16146 removes ThreadLocals from the RPC metrics code.  That may solve 
> the issue (I just saw the patch, never tested it at all), but the 
> {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which 
> may cause issues in the future again.
> h5. Our Solution
> We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and 
> fortunately we didn't notice any performance degradation for our production 
> workloads.
> Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are 
> handled randomly in any of the handlers, small Get or small Scan RPCs do not 
> benefit from the caching (See HBASE-10676 and HBASE-11402 for the details).  
> Probably, we need to see how well reads are saved by the caching for large 
> Scan or Get RPCs and especially for compactions if we really remove the 
> caching. It's probably better if we can remove ThreadLocals without breaking 
> the current caching behavior.
> FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324

2016-11-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-17058:
--
Status: Patch Available  (was: Open)

> Lower epsilon used for jitter verification from HBASE-15324
> ---
>
> Key: HBASE-17058
> URL: https://issues.apache.org/jira/browse/HBASE-17058
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 1.2.4, 1.1.7, 2.0.0, 1.3.0, 1.4.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-17058.master.001.patch
>
>
> The current epsilon used is 1E-6 and its too big it might overflow the 
> desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 
> 2^-53. An option to consider too is just to shift the jitter to always 
> decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the 
> size of the region and having to deal with the round off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324

2016-11-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-17058:
--
Attachment: HBASE-17058.master.001.patch

[~carp84] & [~huaxiang]: attached patch with the jitterRate > 0 approach, it is 
simpler to understand.

> Lower epsilon used for jitter verification from HBASE-15324
> ---
>
> Key: HBASE-17058
> URL: https://issues.apache.org/jira/browse/HBASE-17058
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-17058.master.001.patch
>
>
> The current epsilon used is 1E-6 and its too big it might overflow the 
> desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 
> 2^-53. An option to consider too is just to shift the jitter to always 
> decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the 
> size of the region and having to deal with the round off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324

2016-11-11 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656689#comment-15656689
 ] 

Esteban Gutierrez commented on HBASE-17058:
---

Thanks for the review [~carp84], I will commit tomorrow if there is no other 
objection. 

> Lower epsilon used for jitter verification from HBASE-15324
> ---
>
> Key: HBASE-17058
> URL: https://issues.apache.org/jira/browse/HBASE-17058
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-17058.master.001.patch
>
>
> The current epsilon used is 1E-6 and its too big it might overflow the 
> desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 
> 2^-53. An option to consider too is just to shift the jitter to always 
> decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the 
> size of the region and having to deal with the round off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324

2016-11-11 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657581#comment-15657581
 ] 

Esteban Gutierrez commented on HBASE-17058:
---

No unit tests needed, existing unit test from HBASE-15324 is sufficient.


> Lower epsilon used for jitter verification from HBASE-15324
> ---
>
> Key: HBASE-17058
> URL: https://issues.apache.org/jira/browse/HBASE-17058
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-17058.master.001.patch
>
>
> The current epsilon used is 1E-6 and its too big it might overflow the 
> desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 
> 2^-53. An option to consider too is just to shift the jitter to always 
> decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the 
> size of the region and having to deal with the round off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16345) RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer Exceptions

2016-11-14 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-16345:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Marking this solved.

> RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer 
> Exceptions
> --
>
> Key: HBASE-16345
> URL: https://issues.apache.org/jira/browse/HBASE-16345
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.0, 1.3.0, 1.0.3, 1.4.0, 1.2.3, 1.1.7
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16345-v001.patch, HBASE-16345.branch-1.001.patch, 
> HBASE-16345.branch-1.001.patch, HBASE-16345.master.001.patch, 
> HBASE-16345.master.002.patch, HBASE-16345.master.003.patch, 
> HBASE-16345.master.004.patch, HBASE-16345.master.005.patch, 
> HBASE-16345.master.005.patch, HBASE-16345.master.006.patch
>
>
> Update for the description. Debugged more at this front based on the comments 
> from Enis. 
> The cause is that for the primary replica, if its retry is exhausted too 
> fast, f.get() [1] returns ExecutionException. This Exception needs to be 
> ignored and continue with the replicas.
> The other issue is that after adding calls for the replicas, if the first 
> completed task gets ExecutionException (due to the retry exhausted), it 
> throws the exception to the client[2].
> In this case, it needs to loop through these tasks, waiting for the success 
> one. If no one succeeds, throw exception.
> Similar for the scan as well
> [1] 
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L197
> [2] 
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L219



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-1346) Split column names using a delimeter other than space for TableInputFormat

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-1346:
-
Labels: n  (was: )

> Split column names using a delimeter other than space for TableInputFormat 
> ---
>
> Key: HBASE-1346
> URL: https://issues.apache.org/jira/browse/HBASE-1346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Justin Becker
>  Labels: beginner
>
> Split column names using a delimiter other than space for TableInputFormat.  
> The configure(JobConf) method currently splits column names by the space 
> character.  This prevents scanning by columns where the qualifier contains a 
> space.  For example, "myColumn:some key".  To be consistent with the shell 
> maybe allow the following syntax "['myColumn:some key','myOtherColumn:key']" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-1346) Split column names using a delimeter other than space for TableInputFormat

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-1346:
-
Labels: newbie  (was: n)

> Split column names using a delimeter other than space for TableInputFormat 
> ---
>
> Key: HBASE-1346
> URL: https://issues.apache.org/jira/browse/HBASE-1346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Justin Becker
>  Labels: beginner
>
> Split column names using a delimiter other than space for TableInputFormat.  
> The configure(JobConf) method currently splits column names by the space 
> character.  This prevents scanning by columns where the qualifier contains a 
> space.  For example, "myColumn:some key".  To be consistent with the shell 
> maybe allow the following syntax "['myColumn:some key','myOtherColumn:key']" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-1346) Split column names using a delimeter other than space for TableInputFormat

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-1346:
-
Affects Version/s: 2.0.0

> Split column names using a delimeter other than space for TableInputFormat 
> ---
>
> Key: HBASE-1346
> URL: https://issues.apache.org/jira/browse/HBASE-1346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Justin Becker
>  Labels: beginner
>
> Split column names using a delimiter other than space for TableInputFormat.  
> The configure(JobConf) method currently splits column names by the space 
> character.  This prevents scanning by columns where the qualifier contains a 
> space.  For example, "myColumn:some key".  To be consistent with the shell 
> maybe allow the following syntax "['myColumn:some key','myOtherColumn:key']" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-1346) Split column names using a delimeter other than space for TableInputFormat

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-1346:
-
Labels: beginner  (was: newbie)

> Split column names using a delimeter other than space for TableInputFormat 
> ---
>
> Key: HBASE-1346
> URL: https://issues.apache.org/jira/browse/HBASE-1346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Justin Becker
>  Labels: beginner
>
> Split column names using a delimiter other than space for TableInputFormat.  
> The configure(JobConf) method currently splits column names by the space 
> character.  This prevents scanning by columns where the qualifier contains a 
> space.  For example, "myColumn:some key".  To be consistent with the shell 
> maybe allow the following syntax "['myColumn:some key','myOtherColumn:key']" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-1346) Split column names using a delimeter other than space for TableInputFormat

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-1346:
-
Affects Version/s: (was: 0.19.1)

> Split column names using a delimeter other than space for TableInputFormat 
> ---
>
> Key: HBASE-1346
> URL: https://issues.apache.org/jira/browse/HBASE-1346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Justin Becker
>  Labels: beginner
>
> Split column names using a delimiter other than space for TableInputFormat.  
> The configure(JobConf) method currently splits column names by the space 
> character.  This prevents scanning by columns where the qualifier contains a 
> space.  For example, "myColumn:some key".  To be consistent with the shell 
> maybe allow the following syntax "['myColumn:some key','myOtherColumn:key']" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-1346) Split column names using a delimeter other than space for TableInputFormat

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-1346:
-
Component/s: mapreduce

> Split column names using a delimeter other than space for TableInputFormat 
> ---
>
> Key: HBASE-1346
> URL: https://issues.apache.org/jira/browse/HBASE-1346
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0
>Reporter: Justin Becker
>  Labels: beginner
>
> Split column names using a delimiter other than space for TableInputFormat.  
> The configure(JobConf) method currently splits column names by the space 
> character.  This prevents scanning by columns where the qualifier contains a 
> space.  For example, "myColumn:some key".  To be consistent with the shell 
> maybe allow the following syntax "['myColumn:some key','myOtherColumn:key']" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-2213) HCD should only have those fields explicitly set by user while creating tables

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-2213.
--
Resolution: Won't Fix

Stale, re-open if you consider this needs to be implemented.

> HCD should only have those fields explicitly set by user while creating tables
> --
>
> Key: HBASE-2213
> URL: https://issues.apache.org/jira/browse/HBASE-2213
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.20.3
>Reporter: ryan rawson
>
> right now we take the default HCD fields and 'snapshot' them into every HCD.  
> So things like 'BLOCKCACHE' and 'FILESIZE' are in every table, even if they 
> don't differ from the defaults.  If the default changes in a 
> meanful/important way, the user is left with the unenviable task of (a) 
> determining this happened and (b) actually going through and 
> disabling/altering the tables to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-2376) Add special SnapshotScanner which presents view of all data at some time in the past

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-2376.
--
Resolution: Later

Equivalent functionality can be achieved by using HBASE-4536, HBASE-4071 if you 
think this stills necessary please re-open.

> Add special SnapshotScanner which presents view of all data at some time in 
> the past
> 
>
> Key: HBASE-2376
> URL: https://issues.apache.org/jira/browse/HBASE-2376
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver
>Affects Versions: 0.20.3
>Reporter: Jonathan Gray
>Assignee: Pritam Damania
>
> In order to support a particular kind of database "snapshot" feature which 
> doesn't require copying data, we came up with the idea for a special 
> SnapshotScanner that would present a view of your data at some point in the 
> past.  The primary use case for this would be to be able to recover 
> particular data/rows (but not all data, like a global rollback) should they 
> have somehow been messed up (application fault, application bug, user error, 
> etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-2434) Add scanner caching option to Export and write buffer option for Import

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-2434.
--
Resolution: Won't Fix

No longer relevant, superseded by the buffered mutator and [~yangzhe1991]'s 
rationalization on sizing and timing scanners.

> Add scanner caching option to Export and write buffer option for Import
> ---
>
> Key: HBASE-2434
> URL: https://issues.apache.org/jira/browse/HBASE-2434
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.20.3
>Reporter: Ted Yu
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> An option of number of rows to fetch every time we hit a region server should 
> be added to mapreduce.Export so that createSubmittableJob() calls 
> s.setCaching() with the specified value.
> Also, an option of write buffer size should be added to mapreduce.Import so 
> that we can set write buffer. Sample calls:
> +table.setAutoFlush(false);
> +table.setWriteBufferSize(desired_buffer_size);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-2535) split hostname format should be consistent with tasktracker for locality

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-2535.
--
Resolution: Duplicate

resolved by HBASE-7693

> split hostname format should be consistent with tasktracker for locality
> 
>
> Key: HBASE-2535
> URL: https://issues.apache.org/jira/browse/HBASE-2535
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 0.20.4
>Reporter: John Sichi
>
> I was running a mapreduce job (via Hive) against HBase, and noticed that I 
> wasn't getting any locality (the input split location and the task tracker 
> machine in the job tracker UI were always different, and "Rack-local map 
> tasks" in the job counters was 0).
> I tracked this down to a discrepancy in the way hostnames were being compared.
> The task tracker detail had a Host like
> /f/s/1.2.3.4/h.s.f.com.
> (with trailing dot)
> But the Input Split Location had
> /f/s/1.2.3.4/h.s.f.com
> (without trailing dot)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3307) Add checkAndPut to the Thrift API

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3307.
--
Resolution: Duplicate

dup of HBASE-10960

> Add checkAndPut to the Thrift API
> -
>
> Key: HBASE-3307
> URL: https://issues.apache.org/jira/browse/HBASE-3307
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Affects Versions: 0.89.20100924
>Reporter: Chris Tarnas
>Priority: Minor
>  Labels: thrift
>
> It would be very useful to have the checkAndPut method available via the 
> Thrift API. This would both allow for easier atomic updates as well as cut 
> down on at least one Thrift roundtrip for quite a few common tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3432) [hbck] Add "remove table" switch

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3432.
--
Resolution: Won't Fix

closing as stale. Not seen in a long time.

> [hbck] Add "remove table" switch
> 
>
> Key: HBASE-3432
> URL: https://issues.apache.org/jira/browse/HBASE-3432
> Project: HBase
>  Issue Type: New Feature
>  Components: util
>Affects Versions: 0.89.20100924
>Reporter: Lars George
>Priority: Minor
>
> This happened before and I am not sure how the new Master improves on it 
> (this stuff is only available between the lines are buried in some peoples 
> heads - one other thing I wish was for a better place to communicate what 
> each path improves). Just so we do not miss it, there is an issue that 
> sometimes disabling large tables simply times out and the table gets stuck in 
> limbo. 
> From the CDH User list:
> {quote}
> On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechrist  wrote:
> To get them out of META, you can just scan '.META.' for that table name, and 
> delete those rows. We had to do that a few months ago.
> -Sean
> That did it.  For the benefit of others, here's code.  Beware the literal 
> table names, run at your own peril.
> {quote}
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.client.MetaScanner;
> import org.apache.hadoop.hbase.util.Bytes;
> public class CleanFromMeta {
> public static class Cleaner implements MetaScanner.MetaScannerVisitor {
> public HTable meta = null;
> public Cleaner(Configuration conf) throws IOException {
> meta = new HTable(conf, ".META.");
> }
> public boolean processRow(Result rowResult) throws IOException {
> String r = new String(rowResult.getRow());
> if (r.startsWith("webtable,")) {
> meta.delete(new Delete(rowResult.getRow()));
> System.out.println("Deleting row " + rowResult);
> }
> return true;
> }
> }
> public static void main(String[] args) throws Exception {
> String tname = ".META.";
> Configuration conf = HBaseConfiguration.create();
> MetaScanner.metaScan(conf, new Cleaner(conf), 
>  Bytes.toBytes("webtable"));
> }
> }
> {code}
> I suggest to move this into HBaseFsck. I do not like personally to have these 
> JRuby scripts floating around that may or may not help. This should be 
> available if a user gets stuck and knows what he is doing (they can delete 
> from .META. anyways). Maybe a "\-\-disable-table  \-\-force" or 
> so? But since disable is already in the shell we could add an "\-\-force" 
> there? Or add a "\-\-delete-table " to the hbck?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3457) Auto-tune some GC settings

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3457:
-
Component/s: (was: scripts)
 Operability

> Auto-tune some GC settings
> --
>
> Key: HBASE-3457
> URL: https://issues.apache.org/jira/browse/HBASE-3457
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 2.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hbase-3457.txt, hbase-3457.txt, hbase-env.sh
>
>
> The settings we ship with aren't really optimal for an actual deployment. We 
> can take a look at some things like /proc/cpuinfo and figure out whether to 
> enable parallel GC, turn off CMSIncrementalMode, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3457) Auto-tune some GC settings

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3457:
-
Affects Version/s: (was: 0.90.1)
   2.0.0

> Auto-tune some GC settings
> --
>
> Key: HBASE-3457
> URL: https://issues.apache.org/jira/browse/HBASE-3457
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 2.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hbase-3457.txt, hbase-3457.txt, hbase-env.sh
>
>
> The settings we ship with aren't really optimal for an actual deployment. We 
> can take a look at some things like /proc/cpuinfo and figure out whether to 
> enable parallel GC, turn off CMSIncrementalMode, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-3457) Auto-tune some GC settings

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671459#comment-15671459
 ] 

Esteban Gutierrez commented on HBASE-3457:
--

We should improve this and also include off heap sizing.

> Auto-tune some GC settings
> --
>
> Key: HBASE-3457
> URL: https://issues.apache.org/jira/browse/HBASE-3457
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Affects Versions: 2.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hbase-3457.txt, hbase-3457.txt, hbase-env.sh
>
>
> The settings we ship with aren't really optimal for an actual deployment. We 
> can take a look at some things like /proc/cpuinfo and figure out whether to 
> enable parallel GC, turn off CMSIncrementalMode, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3482) [REST] Add documentation for filter definition in JSON

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3482:
-
Affects Version/s: (was: 0.90.0)
   2.0.0

> [REST] Add documentation for filter definition in JSON
> --
>
> Key: HBASE-3482
> URL: https://issues.apache.org/jira/browse/HBASE-3482
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Affects Versions: 2.0.0
>Reporter: Lars George
>Priority: Minor
>
> Copied from email in dev@:
> Am I wrong or is there a lack of documentation for the FilterModel for the 
> filters in Stargate? The Wiki http://wiki.apache.org/hadoop/Hbase/HbaseRest 
> points to an old 0.20.4 documentation (although saying it is the new place) 
> and the other page we have is void of details on the filters for scans, i.e. 
> http://wiki.apache.org/hadoop/Hbase/Stargate
> They are implemented in https://issues.apache.org/jira/browse/HBASE-1696 
> (also see the linked https://issues.apache.org/jira/browse/HBASE-2274) but no 
> description is public. I used a little helper to get the details like so
> {code}
> import org.apache.hadoop.hbase.filter.BinaryComparator;
> import org.apache.hadoop.hbase.filter.CompareFilter;
> import org.apache.hadoop.hbase.filter.Filter;
> import org.apache.hadoop.hbase.filter.RowFilter;
> import org.apache.hadoop.hbase.rest.model.ScannerModel;
> import org.apache.hadoop.hbase.util.Bytes;
> public class TestFilter {
>  public static void main(String[] args) {
>Filter f = new RowFilter(CompareFilter.CompareOp.EQUAL, new
> BinaryComparator(Bytes.toBytes("testrow")));
>try {
>  System.out.println(ScannerModel.stringifyFilter(f));
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
> }
> {code}
> giving
> {code}
> {"op":"EQUAL","type":"RowFilter","comparator":{"value":"dGVzdHJvdw==","type":"BinaryComparator"}}
> {code}
> Obviously this can be also seen from the FilterModel class but I assume we 
> need some documentation on that Stargate page?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3482) [REST] Add documentation for filter definition in JSON

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3482:
-
Summary: [REST] Add documentation for filter definition in JSON  (was: 
[Stargate] Add documentation for filter definition in JSON)

> [REST] Add documentation for filter definition in JSON
> --
>
> Key: HBASE-3482
> URL: https://issues.apache.org/jira/browse/HBASE-3482
> Project: HBase
>  Issue Type: Improvement
>  Components: REST
>Affects Versions: 2.0.0
>Reporter: Lars George
>Priority: Minor
>
> Copied from email in dev@:
> Am I wrong or is there a lack of documentation for the FilterModel for the 
> filters in Stargate? The Wiki http://wiki.apache.org/hadoop/Hbase/HbaseRest 
> points to an old 0.20.4 documentation (although saying it is the new place) 
> and the other page we have is void of details on the filters for scans, i.e. 
> http://wiki.apache.org/hadoop/Hbase/Stargate
> They are implemented in https://issues.apache.org/jira/browse/HBASE-1696 
> (also see the linked https://issues.apache.org/jira/browse/HBASE-2274) but no 
> description is public. I used a little helper to get the details like so
> {code}
> import org.apache.hadoop.hbase.filter.BinaryComparator;
> import org.apache.hadoop.hbase.filter.CompareFilter;
> import org.apache.hadoop.hbase.filter.Filter;
> import org.apache.hadoop.hbase.filter.RowFilter;
> import org.apache.hadoop.hbase.rest.model.ScannerModel;
> import org.apache.hadoop.hbase.util.Bytes;
> public class TestFilter {
>  public static void main(String[] args) {
>Filter f = new RowFilter(CompareFilter.CompareOp.EQUAL, new
> BinaryComparator(Bytes.toBytes("testrow")));
>try {
>  System.out.println(ScannerModel.stringifyFilter(f));
>} catch (Exception e) {
>  e.printStackTrace();
>}
>  }
> }
> {code}
> giving
> {code}
> {"op":"EQUAL","type":"RowFilter","comparator":{"value":"dGVzdHJvdw==","type":"BinaryComparator"}}
> {code}
> Obviously this can be also seen from the FilterModel class but I assume we 
> need some documentation on that Stargate page?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671544#comment-15671544
 ] 

Esteban Gutierrez commented on HBASE-3562:
--

[~Apache9] any thoughts on this? There seems that the unit tests in the patch 
would be helpful and there is an addition to the ColumnTracker that might be 
helpful in other cases.

> ValueFilter is being evaluated before performing the column match
> -
>
> Key: HBASE-3562
> URL: https://issues.apache.org/jira/browse/HBASE-3562
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 0.90.0, 0.94.7
>Reporter: Evert Arckens
> Attachments: HBASE-3562.patch
>
>
> When performing a Get operation where a both a column is specified and a 
> ValueFilter, the ValueFilter is evaluated before making the column match as 
> is indicated in the javadoc of Get.setFilter()  : " {@link 
> Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column 
> match, deletes and max versions have been run. "
> The is shown in the little test below, which uses a TestComparator extending 
> a WritableByteArrayComparable.
> public void testFilter() throws Exception {
>   byte[] cf = Bytes.toBytes("cf");
>   byte[] row = Bytes.toBytes("row");
>   byte[] col1 = Bytes.toBytes("col1");
>   byte[] col2 = Bytes.toBytes("col2");
>   Put put = new Put(row);
>   put.add(cf, col1, new byte[]{(byte)1});
>   put.add(cf, col2, new byte[]{(byte)2});
>   table.put(put);
>   Get get = new Get(row);
>   get.addColumn(cf, col2); // We only want to retrieve col2
>   TestComparator testComparator = new TestComparator();
>   Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
>   get.setFilter(filter);
>   Result result = table.get(get);
> }
> public class TestComparator extends WritableByteArrayComparable {
> /**
>  * Nullary constructor, for Writable
>  */
> public TestComparator() {
> super();
> }
> 
> @Override
> public int compareTo(byte[] theirValue) {
> if (theirValue[0] == (byte)1) {
> // If the column match was done before evaluating the filter, we 
> should never get here.
> throw new RuntimeException("I only expect (byte)2 in col2, not 
> (byte)1 from col1");
> }
> if (theirValue[0] == (byte)2) {
> return 0;
> }
> else return 1;
> }
> }
> When only one column should be retrieved, this can be worked around by using 
> a SingleColumnValueFilter instead of the ValueFilter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3725) HBase increments from old value after delete and write to disk

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3725.
--
Resolution: Resolved

Resolving per last comment from [~larsh]

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: ShiXing
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
> HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>* First deletes the data then increments the column 10 times by 1 each 
> time
>*
>* Should result in 

[jira] [Comment Edited] (HBASE-3725) HBase increments from old value after delete and write to disk

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671560#comment-15671560
 ] 

Esteban Gutierrez edited comment on HBASE-3725 at 11/16/16 8:41 PM:


Resolving per last comment from [~lhofhansl]


was (Author: esteban):
Resolving per last comment from [~larsh]

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: ShiXing
> Fix For: 0.92.3
>
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
> HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toL

[jira] [Resolved] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3778.
--
Resolution: Duplicate

> HBaseAdmin.create doesn't create empty boundary keys
> 
>
> Key: HBASE-3778
> URL: https://issues.apache.org/jira/browse/HBASE-3778
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: Ted Dunning
> Attachments: HBASE-3778.patch
>
>
> In my ycsb stuff, I have code that looks like this:
> {code}
> String startKey = "user102000";
> String endKey = "user94000";
> admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), 
> regions);
> {code}
> The result, however, is a table where the first and last region has defined 
> first and last keys rather than empty keys.
> The patch I am about to attach fixes this, I think.  I have some worries 
> about other uses of Bytes.split, however, and would like some eyes on this 
> patch.  Perhaps we need a new dialect of split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3782) Multi-Family support for bulk upload tools causes File Not Found Exception

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3782.
--
Resolution: Won't Fix

Should be fixed by atomic bulk loading from HBASE-4552

> Multi-Family support for bulk upload tools causes File Not Found Exception
> --
>
> Key: HBASE-3782
> URL: https://issues.apache.org/jira/browse/HBASE-3782
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.3
>Reporter: Nichole Treadway
> Attachments: HBASE-3782.patch
>
>
> I've been testing HBASE-1861 in 0.90.2, which adds multi-family support for 
> bulk upload tools.
> I found that when running the importtsv program, some reduce tasks fail with 
> a File Not Found exception if there are no keys in the input data which fall 
> into the region assigned to that reduce task.  From what I can determine, it 
> seems that an output directory is created in the write() method and expected 
> to exist in the writeMetaData() method...if there are no keys to be written 
> for that reduce task, the write method is never called and the output 
> directory is never created, but writeMetaData is expecting the output 
> directory to exist...thus the FnF exception:
> 2011-03-17 11:52:48,095 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> running child
> java.io.FileNotFoundException: File does not exist: 
> hdfs://master:9000/awardsData/_temporary/_attempt_201103151859_0066_r_00_0
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:468)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile.getUniqueFile(StoreFile.java:580)
>   at 
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.writeMetaData(HFileOutputFormat.java:186)
>   at 
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.close(HFileOutputFormat.java:247)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Simply checking if the file exists should fix the issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3786) Enhance MasterCoprocessorHost to include notification of balancing of each region

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3786.
--
Resolution: Won't Fix

HBASE-4552 was closed and as [~apurtell] stated in HBASE-3529 with NGDATA's 
hbase-indexer we have some indexing functionality that relies on our 
replication infra.

> Enhance MasterCoprocessorHost to include notification of balancing of each 
> region
> -
>
> Key: HBASE-3786
> URL: https://issues.apache.org/jira/browse/HBASE-3786
> Project: HBase
>  Issue Type: Improvement
>  Components: Coprocessors
>Affects Versions: 0.90.2
>Reporter: Ted Yu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3791) Display total number of zookeeper connections on master.jsp

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3791.
--
Resolution: Fixed

zk.jsp (ZKUtil.dump() see HBASE-2692) from the Master UI already provides the 
total number of connections to ZK open.

> Display total number of zookeeper connections on master.jsp
> ---
>
> Key: HBASE-3791
> URL: https://issues.apache.org/jira/browse/HBASE-3791
> Project: HBase
>  Issue Type: Improvement
>  Components: Zookeeper
>Affects Versions: 0.90.2
>Reporter: Ted Yu
> Attachments: 3791.patch
>
>
> Quite often, user needs to telnet to Zookeeper and type 'stats' to get the 
> connections, or count the connections on zk.jsp
> We should display the total number of connections beside the link to zk.jsp 
> on master.jsp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3792) TableInputFormat leaks ZK connections

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3792.
--
Resolution: Won't Fix

> TableInputFormat leaks ZK connections
> -
>
> Key: HBASE-3792
> URL: https://issues.apache.org/jira/browse/HBASE-3792
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.1
> Environment: Java 1.6.0_24, Mac OS X 10.6.7
>Reporter: Bryan Keller
> Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and 
> it never cleans it up. When running a Mapper, the TableInputFormat is 
> instantiated and the ZK connection is created. While this connection is not 
> explicitly cleaned up, the Mapper process eventually exits and thus the 
> connection is closed. Ideally the TableRecordReader would close the 
> connection in its close() method rather than relying on the process to die 
> for connection cleanup. This is fairly easy to implement by overriding 
> TableRecordReader, and also overriding TableInputFormat to specify the new 
> record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the 
> splits. To get the splits, it instantiates a TableInputFormat. Doing so 
> creates a ZK connection that is never cleaned up. Unlike the mapper, however, 
> my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does 
> not initialize the HTable in the getConf() method and does not have an HTable 
> member variable. Rather, it has a variable for the table name. The HTable is 
> instantiated where needed and then cleaned up. For example, in the 
> getSplits() method, I create the HTable, then close the connection once the 
> splits are retrieved. I also create the HTable when creating the record 
> reader, and I have a record reader that closes the connection when done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671663#comment-15671663
 ] 

Esteban Gutierrez commented on HBASE-3792:
--

The ZK connection leaks have been addressed multiple times over the last few 
years, see HBASE-14485, HBASE-15803, HBASE-16117. Also, most of the connection 
manager code has been refactored.

> TableInputFormat leaks ZK connections
> -
>
> Key: HBASE-3792
> URL: https://issues.apache.org/jira/browse/HBASE-3792
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.90.1
> Environment: Java 1.6.0_24, Mac OS X 10.6.7
>Reporter: Bryan Keller
> Attachments: patch0.90.4, tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and 
> it never cleans it up. When running a Mapper, the TableInputFormat is 
> instantiated and the ZK connection is created. While this connection is not 
> explicitly cleaned up, the Mapper process eventually exits and thus the 
> connection is closed. Ideally the TableRecordReader would close the 
> connection in its close() method rather than relying on the process to die 
> for connection cleanup. This is fairly easy to implement by overriding 
> TableRecordReader, and also overriding TableInputFormat to specify the new 
> record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the 
> splits. To get the splits, it instantiates a TableInputFormat. Doing so 
> creates a ZK connection that is never cleaned up. Unlike the mapper, however, 
> my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does 
> not initialize the HTable in the getConf() method and does not have an HTable 
> member variable. Rather, it has a variable for the table name. The HTable is 
> instantiated where needed and then cleaned up. For example, in the 
> getSplits() method, I create the HTable, then close the connection once the 
> splits are retrieved. I also create the HTable when creating the record 
> reader, and I have a record reader that closes the connection when done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3854) [thrift] broken thrift examples

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3854:
-
Summary: [thrift] broken thrift examples  (was: broken examples)

> [thrift] broken thrift examples
> ---
>
> Key: HBASE-3854
> URL: https://issues.apache.org/jira/browse/HBASE-3854
> Project: HBase
>  Issue Type: Bug
>  Components: Thrift
>Affects Versions: 0.20.0
>Reporter: Alexey Diomin
>Priority: Minor
>
> We introduce NotFound exception in HBASE-1292, but we drop it in HBASE-1367.
> In result:
> 1. incorrect doc in Hbase.thrift in as result in generated java and java-doc
> 2. broken examples in src/examples/thrift/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3854) [thrift] broken thrift examples

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3854.
--
Resolution: Later

Resolving as later for now. We should fix coverage on the hbase-examples 
module. At least the code generation for php, perl and others seems to work.

> [thrift] broken thrift examples
> ---
>
> Key: HBASE-3854
> URL: https://issues.apache.org/jira/browse/HBASE-3854
> Project: HBase
>  Issue Type: Bug
>  Components: Thrift
>Affects Versions: 0.20.0
>Reporter: Alexey Diomin
>Priority: Minor
>
> We introduce NotFound exception in HBASE-1292, but we drop it in HBASE-1367.
> In result:
> 1. incorrect doc in Hbase.thrift in as result in generated java and java-doc
> 2. broken examples in src/examples/thrift/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3859) Increment a counter when a Scanner lease expires

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3859:
-
Tags: monitoring, metrics  (was: monitoring, metrics, b)

> Increment a counter when a Scanner lease expires
> 
>
> Key: HBASE-3859
> URL: https://issues.apache.org/jira/browse/HBASE-3859
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Affects Versions: 2.0.0
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-3859.trunk.v1.patch
>
>
> Whenever a Scanner lease expires, the RegionServer will close it 
> automatically and log a message to complain.  I would like the RegionServer 
> to increment a counter whenever this happens and expose this counter through 
> the metrics system, so we can plug this into our monitoring system (OpenTSDB) 
> and keep track of how frequently this happens.  It's not supposed to happen 
> frequently so it's good to keep an eye on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3859) Increment a counter when a Scanner lease expires

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3859:
-
Tags: monitoring, metrics, b  (was: monitoring, metrics)

> Increment a counter when a Scanner lease expires
> 
>
> Key: HBASE-3859
> URL: https://issues.apache.org/jira/browse/HBASE-3859
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Affects Versions: 2.0.0
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-3859.trunk.v1.patch
>
>
> Whenever a Scanner lease expires, the RegionServer will close it 
> automatically and log a message to complain.  I would like the RegionServer 
> to increment a counter whenever this happens and expose this counter through 
> the metrics system, so we can plug this into our monitoring system (OpenTSDB) 
> and keep track of how frequently this happens.  It's not supposed to happen 
> frequently so it's good to keep an eye on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3859) Increment a counter when a Scanner lease expires

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3859:
-
Affects Version/s: (was: 0.90.2)
   2.0.0

> Increment a counter when a Scanner lease expires
> 
>
> Key: HBASE-3859
> URL: https://issues.apache.org/jira/browse/HBASE-3859
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Affects Versions: 2.0.0
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-3859.trunk.v1.patch
>
>
> Whenever a Scanner lease expires, the RegionServer will close it 
> automatically and log a message to complain.  I would like the RegionServer 
> to increment a counter whenever this happens and expose this counter through 
> the metrics system, so we can plug this into our monitoring system (OpenTSDB) 
> and keep track of how frequently this happens.  It's not supposed to happen 
> frequently so it's good to keep an eye on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-3859) Increment a counter when a Scanner lease expires

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-3859:
-
Labels: beginner  (was: )

> Increment a counter when a Scanner lease expires
> 
>
> Key: HBASE-3859
> URL: https://issues.apache.org/jira/browse/HBASE-3859
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Affects Versions: 2.0.0
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-3859.trunk.v1.patch
>
>
> Whenever a Scanner lease expires, the RegionServer will close it 
> automatically and log a message to complain.  I would like the RegionServer 
> to increment a counter whenever this happens and expose this counter through 
> the metrics system, so we can plug this into our monitoring system (OpenTSDB) 
> and keep track of how frequently this happens.  It's not supposed to happen 
> frequently so it's good to keep an eye on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-3859) Increment a counter when a Scanner lease expires

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671687#comment-15671687
 ] 

Esteban Gutierrez commented on HBASE-3859:
--

Sounds helpful to have, specially now that we have a better metrics framework.

> Increment a counter when a Scanner lease expires
> 
>
> Key: HBASE-3859
> URL: https://issues.apache.org/jira/browse/HBASE-3859
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Affects Versions: 2.0.0
>Reporter: Benoit Sigoure
>Assignee: Mubarak Seyed
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-3859.trunk.v1.patch
>
>
> Whenever a Scanner lease expires, the RegionServer will close it 
> automatically and log a message to complain.  I would like the RegionServer 
> to increment a counter whenever this happens and expose this counter through 
> the metrics system, so we can plug this into our monitoring system (OpenTSDB) 
> and keep track of how frequently this happens.  It's not supposed to happen 
> frequently so it's good to keep an eye on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3975) NoServerForRegionException stalls write pipeline

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3975.
--
Resolution: Fixed

The new async client is taking care of this.

> NoServerForRegionException stalls write pipeline
> 
>
> Key: HBASE-3975
> URL: https://issues.apache.org/jira/browse/HBASE-3975
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.89.20100924, 0.90.3, 0.92.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
>
> When we process a batch of puts, the current algorithm basically goes like 
> this:
> 1. Find all servers for the Put requests
> 2. Partition Puts by servers
> 3. Make requests
> 4. Collect success/error results
> If we throw an IOE in step 1 or 2, we will abort the whole batch operation.  
> In our case, this was an NoServerForRegionException due to region 
> rebalancing.  However, the asynchronous put case normally has requests going 
> to a wide variety of servers.  We should fail all the put requests that throw 
> an IOE in Step 1 but continue to try all the put requests that succeed at 
> this stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-3991) Add Util folder for Utility Scripts

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-3991.
--
Resolution: Won't Fix

No progress on this in 5 years. We tend to unify things on the main hbase 
script or the hbase shell, in some cases like the region_mover.rb we ended 
creating better tooling.

> Add Util folder for Utility Scripts
> ---
>
> Key: HBASE-3991
> URL: https://issues.apache.org/jira/browse/HBASE-3991
> Project: HBase
>  Issue Type: Brainstorming
>  Components: scripts, util
>Affects Versions: 0.92.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
>
> This JIRA is to start discussion around adding some sort of 'util' folder to 
> HBase for common operational scripts.  We're starting to write a lot of HBase 
> analysis utilities that we'd love to share with open source, but don't want 
> to clutter the 'bin' folder, which seems like it should be reserved for 
> start/stop tasks.  If we add a 'util' folder, how do we keep it from becoming 
> a cesspool of half-baked & duplicated operational hacks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-6205) Support an option to keep data of dropped table for some time

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-6205.
--
Resolution: Later

Resolving for later, We already have the archive and snapshots and we could 
take care of this after HBASE-14439.

> Support an option to keep data of dropped table for some time
> -
>
> Key: HBASE-6205
> URL: https://issues.apache.org/jira/browse/HBASE-6205
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 0.94.0, 0.95.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: HBASE-6205.patch, HBASE-6205v2.patch, 
> HBASE-6205v3.patch, HBASE-6205v4.patch, HBASE-6205v5.patch
>
>
> User may drop table accidentally because of error code or other uncertain 
> reasons.
> Unfortunately, it happens in our environment because one user make a mistake 
> between production cluster and testing cluster.
> So, I just give a suggestion, do we need to support an option to keep data of 
> dropped table for some time, e.g. 1 day
> In the patch:
> We make a new dir named .trashtables in the rood dir.
> In the DeleteTableHandler, we move files in dropped table's dir to trash 
> table dir instead of deleting them directly.
> And Create new class TrashCleaner which will clean dropped tables if it is 
> time out with a period check.
> Default keep time for dropped tables is 1 day, and check period is 1 hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9984) AggregationClient creates a new Htable, HConnection,and ExecutorService in every CP call.

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-9984:
-
Affects Version/s: 2.0.0

> AggregationClient creates a new Htable, HConnection,and ExecutorService in 
> every CP call.
> -
>
> Key: HBASE-9984
> URL: https://issues.apache.org/jira/browse/HBASE-9984
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Coprocessors
>Affects Versions: 0.94.13, 2.0.0
>Reporter: Anil Gupta
>Priority: Minor
>  Labels: aggregate, client, coprocessors, hbase
>
> At present AggregationClient takes Conf in constructor and create a new 
> Htable instance on every method calls. The constructor of HTable used in 
> AggregationClient is very heavy as it creates a new HConnection and 
> ExecutorService. 
> Above mechanism is not convenient where the Application is managing HTable, 
> HConnection, ExecutorService by itself. So, i propose 
> 1# AggregationClient should provide an additional constructor: 
> AggregationClient(HTable)
> 2# Provide methods that takes Htable.
> In this way we can avoid creation of Htable, HConnection,and ExecutorService 
> in every CP call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-9968) Cluster is non operative if the RS carrying -ROOT- is expiring after deleting -ROOT- region transition znode and before adding it to online regions.

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-9968.
--
Resolution: Won't Fix

We no longer have {{-ROOT-}}

> Cluster is non operative if the RS carrying -ROOT- is expiring after deleting 
> -ROOT- region transition znode and before adding it to online regions.
> 
>
> Key: HBASE-9968
> URL: https://issues.apache.org/jira/browse/HBASE-9968
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 0.94.11
>Reporter: rajeshbabu
>Assignee: rajeshbabu
>
> When we check whether the dead region is carrying root or meta, first we will 
> check any transition znode for the region is there or not. In this case it 
> got deleted. So from zookeeper we cannot find the region location. 
> {code}
> try {
>   data = ZKAssign.getData(master.getZooKeeper(), hri.getEncodedName());
> } catch (KeeperException e) {
>   master.abort("Unexpected ZK exception reading unassigned node for 
> region="
> + hri.getEncodedName(), e);
> }
> {code}
> Now we will check from the AssignmentManager whether its in online regions or 
> not
> {code}
> ServerName addressFromAM = getRegionServerOfRegion(hri);
> boolean matchAM = (addressFromAM != null &&
>   addressFromAM.equals(serverName));
> LOG.debug("based on AM, current region=" + hri.getRegionNameAsString() +
>   " is on server=" + (addressFromAM != null ? addressFromAM : "null") +
>   " server being checked: " + serverName);
> {code}
> From AM we will get null because  while adding region to online regions we 
> will check whether the RS is in onlineservers or not and if not we will not 
> add the region to online regions.
> {code}
>   if (isServerOnline(sn)) {
> this.regions.put(regionInfo, sn);
> addToServers(sn, regionInfo);
> this.regions.notifyAll();
>   } else {
> LOG.info("The server is not in online servers, ServerName=" + 
>   sn.getServerName() + ", region=" + regionInfo.getEncodedName());
>   }
> {code}
> Even though the dead regionserver carrying ROOT region, its returning false. 
> After that ROOT region never assigned.
> Here are the logs
> {code}
> 2013-11-11 18:04:14,730 INFO 
> org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
> location in ZooKeeper
> 2013-11-11 18:04:14,775 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
> was found (or we are ignoring an existing plan) for -ROOT-,,0.70236052 so 
> generated a random one; hri=-ROOT-,,0.70236052, src=, 
> dest=HOST-10-18-40-69,60020,1384173244404; 1 (online=1, available=1) 
> available servers
> 2013-11-11 18:04:14,809 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> -ROOT-,,0.70236052 to HOST-10-18-40-69,60020,1384173244404
> 2013-11-11 18:04:18,375 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> Looked up root region location, 
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@12133926;
>  serverName=HOST-10-18-40-69,60020,1384173244404
> 2013-11-11 18:04:26,213 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=HOST-10-18-40-69,60020,1384173244404, 
> region=70236052/-ROOT-
> 2013-11-11 18:04:26,213 INFO 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for -ROOT-,,0.70236052 from HOST-10-18-40-69,60020,1384173244404; 
> deleting unassigned node
> 2013-11-11 18:04:31,553 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current 
> region=-ROOT-,,0.70236052 is on server=null server being checked: 
> HOST-10-18-40-69,60020,1384173244404
> 2013-11-11 18:04:31,561 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
> Added=HOST-10-18-40-69,60020,1384173244404 to dead servers, submitted 
> shutdown handler to be executed, root=false, meta=false
> {code}
> {code}
> 2013-11-11 18:04:32,323 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> -ROOT-,,0.70236052 has been deleted.
> 2013-11-11 18:04:32,323 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The server is not in online 
> servers, ServerName=HOST-10-18-40-69,60020,1384173244404, region=70236052
> 2013-11-11 18:04:32,323 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the 
> region -ROOT-,,0.70236052 that was online on 
> HOST-10-18-40-69,60020,1384173244404
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size

2016-11-16 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-17116:
-

 Summary: [PerformanceEvaluation] Add option to configure block size
 Key: HBASE-17116
 URL: https://issues.apache.org/jira/browse/HBASE-17116
 Project: HBase
  Issue Type: Bug
  Components: tooling
Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5
Reporter: Esteban Gutierrez
Priority: Trivial


Followup from HBASE-9940 to add option to configure block size for 
PerformanceEvaluation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-9940) PerformanceEvaluation should have a test with many table options on (Bloom, compression, FAST_DIFF, etc.)

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-9940.
--
Resolution: Fixed

Most of the features requested by [~jmspaggi] are already present in 
PerformanceEvaluation. Created HBASE-17116 to address missing feature to 
configure block size.


> PerformanceEvaluation should have a test with many table options on (Bloom, 
> compression, FAST_DIFF, etc.)
> -
>
> Key: HBASE-9940
> URL: https://issues.apache.org/jira/browse/HBASE-9940
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, test
>Affects Versions: 0.96.0, 0.94.13
>Reporter: Jean-Marc Spaggiari
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-9925) Don't close a file if doesn't EOF while replicating

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-9925.
--
Resolution: Later

Resolving for later, we should better fix other replication bottlenecks before 
we hit some contention from the NN.

> Don't close a file if doesn't EOF while replicating
> ---
>
> Key: HBASE-9925
> URL: https://issues.apache.org/jira/browse/HBASE-9925
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Himanshu Vashishtha
>
> While doing replication, we open and close the WAL file _every_ time we read 
> entries to send. We could open/close the reader only when we hit EOF. That 
> would alleviate some NN load, especially on a write heavy cluster.
> This came while discussing our current open/close heuristic in replication 
> with [~jdcryans].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-9913) weblogic deployment project implementation under the mapreduce hbase reported a NullPointerException

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-9913.
--
Resolution: Duplicate

Fixed in HBASE-12491

> weblogic deployment project implementation under the mapreduce hbase reported 
> a NullPointerException
> 
>
> Key: HBASE-9913
> URL: https://issues.apache.org/jira/browse/HBASE-9913
> Project: HBase
>  Issue Type: Bug
>  Components: hadoop2, mapreduce
>Affects Versions: 0.94.10
> Environment: weblogic windows
>Reporter: 刘泓
> Attachments: TableMapReduceUtil.class, TableMapReduceUtil.java
>
>
> java.lang.NullPointerException
>   at java.io.File.(File.java:222)
>   at java.util.zip.ZipFile.(ZipFile.java:75)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.updateMap(TableMapReduceUtil.java:617)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(TableMapReduceUtil.java:597)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:557)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:518)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:144)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:221)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:87)
>   at 
> com.easymap.ezserver6.map.source.hbase.convert.HBaseMapMerge.beginMerge(HBaseMapMerge.java:163)
>   at 
> com.easymap.ezserver6.app.servlet.EzMapToHbaseService.doPost(EzMapToHbaseService.java:32)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
>   at 
> weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
>   at 
> weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292)
>   at 
> weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:175)
>   at 
> weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3594)
>   at 
> weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
>   at 
> weblogic.security.service.SecurityManager.runAs(SecurityManager.java:121)
>   at 
> weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2202)
>   at 
> weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2108)
>   at 
> weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1432)
>   at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
>   at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
> > 
> my project deploy under weblogic11,and when i run hbase mapreduce,it throws a 
> NullPointerException.i found the method 
> TableMapReduceUtil.findContainingJar() returns null,so i debug it, 
> url.getProtocol() return "zip",but the file is a jar file,so the if condition:
>  if ("jar".equals(url.getProtocol()))  cann't run. so i add a if condition to 
> judge "zip" type



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9887) Optimize client pause & back-off time

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-9887:
-
Affects Version/s: 2.0.0

> Optimize client pause & back-off time
> -
>
> Key: HBASE-9887
> URL: https://issues.apache.org/jira/browse/HBASE-9887
> Project: HBase
>  Issue Type: Brainstorming
>  Components: Client
>Affects Versions: 0.98.0, 0.96.0, 2.0.0
>Reporter: Nicolas Liochon
>Priority: Minor
>
> The client can log all the retries, with this setting
> {noformat}
> 
> hbase.client.start.log.errors.counter
> 0
> 
> {noformat}
> We should use it to fix the pause time, as well as the back off.
> I need to to a complete test, but on 100m, with the default config, I saw 
> something like
> 5% of the first retry were successful
> 5% of ALL the retries between 2 and 6 were successful
> 90% of the retries between 7 and 9 were successful
> So on this (too small) sample, the retries between 2 and 6 are nearly useless.
> I will do a more complete test as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9879) Can't undelete a KeyValue

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-9879:
-
Affects Version/s: 2.0.0

> Can't undelete a KeyValue
> -
>
> Key: HBASE-9879
> URL: https://issues.apache.org/jira/browse/HBASE-9879
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0, 2.0.0
>Reporter: Benoit Sigoure
>
> Test scenario:
> put(KV, timestamp=100)
> put(KV, timestamp=200)
> delete(KV, timestamp=200, with MutationProto.DeleteType.DELETE_ONE_VERSION)
> get(KV) => returns value at timestamp=100 (OK)
> put(KV, timestamp=200)
> get(KV) => returns value at timestamp=100 (but not the one at timestamp=200 
> that was "reborn" by the previous put)
> Is that normal?
> I ran into this bug while running the integration tests at 
> https://github.com/OpenTSDB/asynchbase/pull/60 – the first time you run it, 
> it passes, but after that, it keeps failing.  Sorry I don't have the 
> corresponding HTable-based code but that should be fairly easy to write.
> I only tested this with 0.96.0, dunno yet how this behaved in prior releases.
> My hunch is that the tombstone added by the DELETE_ONE_VERSION keeps 
> shadowing the value even after it's reborn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9879) Can't undelete a KeyValue

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671820#comment-15671820
 ] 

Esteban Gutierrez commented on HBASE-9879:
--

This should be addressed by the work on HLC (HBASE-14070) and related JIRAs 
about MVCC (HBASE-15968).


> Can't undelete a KeyValue
> -
>
> Key: HBASE-9879
> URL: https://issues.apache.org/jira/browse/HBASE-9879
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0, 2.0.0
>Reporter: Benoit Sigoure
>
> Test scenario:
> put(KV, timestamp=100)
> put(KV, timestamp=200)
> delete(KV, timestamp=200, with MutationProto.DeleteType.DELETE_ONE_VERSION)
> get(KV) => returns value at timestamp=100 (OK)
> put(KV, timestamp=200)
> get(KV) => returns value at timestamp=100 (but not the one at timestamp=200 
> that was "reborn" by the previous put)
> Is that normal?
> I ran into this bug while running the integration tests at 
> https://github.com/OpenTSDB/asynchbase/pull/60 – the first time you run it, 
> it passes, but after that, it keeps failing.  Sorry I don't have the 
> corresponding HTable-based code but that should be fairly easy to write.
> I only tested this with 0.96.0, dunno yet how this behaved in prior releases.
> My hunch is that the tombstone added by the DELETE_ONE_VERSION keeps 
> shadowing the value even after it's reborn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-9877) HFileOutputFormat creates partitions file with random name under /tmp

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-9877.
--
Resolution: Fixed

Original issue was addressed by HBASE-13010. HBASE-15129 exposes 
hbase.fs.tmp.dir and you can specify a different location for the partition 
file.

> HFileOutputFormat creates partitions file with random name under /tmp
> -
>
> Key: HBASE-9877
> URL: https://issues.apache.org/jira/browse/HBASE-9877
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 0.98.0, 0.94.12, 0.96.0
>Reporter: Nick Dimiduk
>Priority: Minor
>
> HFileOutputFormat creates a partitions file for managing reducer partition 
> boundaries. This file is created under /tmp and is just a UUID. As [pointed 
> out|https://issues.apache.org/jira/browse/HBASE-4285?focusedCommentId=13620294&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13620294]
>  by [~lurbina], there's no guarantee this path exists or is user-writable. As 
> [~yuzhih...@gmail.com] 
> [mentions|https://issues.apache.org/jira/browse/HBASE-4285?focusedCommentId=13602934&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13602934],
>  this file name is useless for users trying to debug a failing job. We should 
> give it a meaningful name, based on job name and/or timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9844) zookeepers.sh - ZKServerTool log permission issue

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-9844:
-
Affects Version/s: 2.0.0

> zookeepers.sh - ZKServerTool log permission issue
> -
>
> Key: HBASE-9844
> URL: https://issues.apache.org/jira/browse/HBASE-9844
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 0.94.12, 2.0.0
> Environment: Linux
>Reporter: Sebastien Barrier
>Priority: Minor
>  Labels: beginner
>
> The zookeepers.sh script exec the following command during it's process
> /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool
> before doing this it also change of directory to the hbase binary for example 
> 'cd /usr/local/hbase/bin' if the permissions of the directory are differents 
> from the user running the ZKServerTool for example hadoop user and root for 
> the directory there's the following error because it try to create a log file 
> (hadoop.log) in the current directory
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: ./hadoop.log (Permission denied)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:212)
> at java.io.FileOutputStream.(FileOutputStream.java:136)
> the log should be written in HBASE_LOG_DIR and not in the current directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-9844) zookeepers.sh - ZKServerTool log permission issue

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-9844:
-
Labels: beginner  (was: )

> zookeepers.sh - ZKServerTool log permission issue
> -
>
> Key: HBASE-9844
> URL: https://issues.apache.org/jira/browse/HBASE-9844
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 0.94.12, 2.0.0
> Environment: Linux
>Reporter: Sebastien Barrier
>Priority: Minor
>  Labels: beginner
>
> The zookeepers.sh script exec the following command during it's process
> /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool
> before doing this it also change of directory to the hbase binary for example 
> 'cd /usr/local/hbase/bin' if the permissions of the directory are differents 
> from the user running the ZKServerTool for example hadoop user and root for 
> the directory there's the following error because it try to create a log file 
> (hadoop.log) in the current directory
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: ./hadoop.log (Permission denied)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:212)
> at java.io.FileOutputStream.(FileOutputStream.java:136)
> the log should be written in HBASE_LOG_DIR and not in the current directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9844) zookeepers.sh - ZKServerTool log permission issue

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671860#comment-15671860
 ] 

Esteban Gutierrez commented on HBASE-9844:
--

We shouldn't be logging to a file. We need a flag to log to stderr instead 
since the ZKServerTool only prints the list of the ZK quorum.

> zookeepers.sh - ZKServerTool log permission issue
> -
>
> Key: HBASE-9844
> URL: https://issues.apache.org/jira/browse/HBASE-9844
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 0.94.12, 2.0.0
> Environment: Linux
>Reporter: Sebastien Barrier
>Priority: Minor
>  Labels: beginner
>
> The zookeepers.sh script exec the following command during it's process
> /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool
> before doing this it also change of directory to the hbase binary for example 
> 'cd /usr/local/hbase/bin' if the permissions of the directory are differents 
> from the user running the ZKServerTool for example hadoop user and root for 
> the directory there's the following error because it try to create a log file 
> (hadoop.log) in the current directory
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: ./hadoop.log (Permission denied)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:212)
> at java.io.FileOutputStream.(FileOutputStream.java:136)
> the log should be written in HBASE_LOG_DIR and not in the current directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-9826) KeyValue should guard itself against corruptions

2016-11-16 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-9826.
--
Resolution: Fixed

in KeyValue.createByteArray() we check for the valid parameters used in the 
constructor. Resolving since we haven't seen this corrupted KVs in a very long 
time.

> KeyValue should guard itself against corruptions
> 
>
> Key: HBASE-9826
> URL: https://issues.apache.org/jira/browse/HBASE-9826
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89-fb
>Reporter: Amitanand Aiyer
>Priority: Minor
>
> We have seen a case where a corrupted KV was causing a flush to fail 
> repeatedly.
> KV seems to have some sanity checks when it is created. But, not sure how the 
> corrupted KV got in.
> We could add some sanity checks before/after serialization to make sure KV's 
> are not corrupted.
> I've seen this issue on 0.89. But, I am not sure about the other versions. 
> Since the trunk has moved to pb; this may not apply.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9802) A new failover test framework for HBase

2016-11-16 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671879#comment-15671879
 ] 

Esteban Gutierrez commented on HBASE-9802:
--

[~tobe] do you still have plans to contribute this back?

> A new failover test framework for HBase
> ---
>
> Key: HBASE-9802
> URL: https://issues.apache.org/jira/browse/HBASE-9802
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.94.3
>Reporter: chendihao
>Assignee: chendihao
>Priority: Minor
>
> Currently HBase uses ChaosMonkey for IT test and fault injection. It will 
> restart regionserver, force balancer and perform other actions randomly and 
> periodically. However, we need a more extensible and full-featured framework 
> for our failover test and we find ChaosMonkey cant' suit our needs since it 
> has the following drawbacks.
> 1) Only process-level actions can be simulated, not support 
> machine-level/hardware-level/network-level actions.
> 2) No data validation before and after the test, the fatal bugs such as that 
> can cause data inconsistent may be overlook.
> 3) When failure occurs, we can't repro the problem and hard to figure out the 
> reason.
> Therefore, we have developed a new framework to satisfy the need of failover 
> test. We extended ChaosMonkey and implement the function to validate data and 
> to replay failed actions. Here are the features we add.
> 1) Policy/Task/Action abstraction, seperating Task from Policy and Action 
> makes it easier to manage and replay a set of actions.
> 2) Make action configurable. We have implemented some actions to cause 
> machine failure and defined the same interface as original actions.
> 3) We should validate the date consistent before and after failover test to 
> ensure the availability and data correctness.
> 4) After performing a set of actions, we also check the consistency of table 
> as well.
> 5) The set of actions that caused test failure can be replayed, and the 
> reproducibility of actions can help fixing the exposed bugs.
> Our team has developed this framework and run for a while. Some bugs were 
> exposed and fixed by running this test framework. Moreover, we have a monitor 
> program which shows the progress of failover test and make sure our cluster 
> is as stable as we want. Now we are trying to make it more general and will 
> opensource it later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324

2016-11-17 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-17058:
--
Fix Version/s: 1.4.0
   2.0.0

> Lower epsilon used for jitter verification from HBASE-15324
> ---
>
> Key: HBASE-17058
> URL: https://issues.apache.org/jira/browse/HBASE-17058
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17058.master.001.patch
>
>
> The current epsilon used is 1E-6 and its too big it might overflow the 
> desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 
> 2^-53. An option to consider too is just to shift the jitter to always 
> decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the 
> size of the region and having to deal with the round off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324

2016-11-17 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-17058:
--
Fix Version/s: 1.1.8

> Lower epsilon used for jitter verification from HBASE-15324
> ---
>
> Key: HBASE-17058
> URL: https://issues.apache.org/jira/browse/HBASE-17058
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.8
>
> Attachments: HBASE-17058.master.001.patch
>
>
> The current epsilon used is 1E-6 and its too big it might overflow the 
> desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 
> 2^-53. An option to consider too is just to shift the jitter to always 
> decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the 
> size of the region and having to deal with the round off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   >