[jira] [Comment Edited] (HBASE-14703) update the per-region stats twice for the call on return

2015-10-27 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977858#comment-14977858
 ] 

Anoop Sam John edited comment on HBASE-14703 at 10/28/15 6:49 AM:
--

This wrapper was added by
HBASE-5162 Basic client pushback mechanism
If we remove, whether we will be tracking all the stats. It looks like this 
wrapper sits bit more generic way than tracking result stats at each and every 
place.
Ping [~jesse_yates]


was (Author: anoop.hbase):
This wrapper was added by
HBASE-5162 Basic client pushback mechanism

> update the per-region stats twice for the call on return
> 
>
> Key: HBASE-14703
> URL: https://issues.apache.org/jira/browse/HBASE-14703
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
>Assignee: Heng Chen
> Attachments: HBASE-14703.patch
>
>
> In {{AsyncProcess.SingleServerRequestRunnable}}, it seems we update 
> serverStatistics twice.
> The first one is that we wrapper {{RetryingCallable}}  by 
> {{StatsTrackingRpcRetryingCaller}}, and do serverStatistics update when we 
> call {{callWithRetries}} and {{callWithoutRetries}}. Relates code like below:
> {code}
>   @Override
>   public T callWithRetries(RetryingCallable callable, int callTimeout)
>   throws IOException, RuntimeException {
> T result = delegate.callWithRetries(callable, callTimeout);
> return updateStatsAndUnwrap(result, callable);
>   }
>   @Override
>   public T callWithoutRetries(RetryingCallable callable, int callTimeout)
>   throws IOException, RuntimeException {
> T result = delegate.callWithRetries(callable, callTimeout);
> return updateStatsAndUnwrap(result, callable);
>   }
> {code}
> The secondary one is after we get response, in {{receiveMultiAction}}, we do 
> update again. 
> {code}
> // update the stats about the region, if its a user table. We don't want to 
> slow down
> // updates to meta tables, especially from internal updates (master, etc).
> if (AsyncProcess.this.connection.getStatisticsTracker() != null) {
>   result = ResultStatsUtil.updateStats(result,
>   AsyncProcess.this.connection.getStatisticsTracker(), server, regionName);
> }
> {code}
> It seems that {{StatsTrackingRpcRetryingCaller}} is NOT necessary,  remove it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14703) update the per-region stats twice for the call on return

2015-10-27 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977858#comment-14977858
 ] 

Anoop Sam John commented on HBASE-14703:


This wrapper was added by
HBASE-5162 Basic client pushback mechanism

> update the per-region stats twice for the call on return
> 
>
> Key: HBASE-14703
> URL: https://issues.apache.org/jira/browse/HBASE-14703
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
>Assignee: Heng Chen
> Attachments: HBASE-14703.patch
>
>
> In {{AsyncProcess.SingleServerRequestRunnable}}, it seems we update 
> serverStatistics twice.
> The first one is that we wrapper {{RetryingCallable}}  by 
> {{StatsTrackingRpcRetryingCaller}}, and do serverStatistics update when we 
> call {{callWithRetries}} and {{callWithoutRetries}}. Relates code like below:
> {code}
>   @Override
>   public T callWithRetries(RetryingCallable callable, int callTimeout)
>   throws IOException, RuntimeException {
> T result = delegate.callWithRetries(callable, callTimeout);
> return updateStatsAndUnwrap(result, callable);
>   }
>   @Override
>   public T callWithoutRetries(RetryingCallable callable, int callTimeout)
>   throws IOException, RuntimeException {
> T result = delegate.callWithRetries(callable, callTimeout);
> return updateStatsAndUnwrap(result, callable);
>   }
> {code}
> The secondary one is after we get response, in {{receiveMultiAction}}, we do 
> update again. 
> {code}
> // update the stats about the region, if its a user table. We don't want to 
> slow down
> // updates to meta tables, especially from internal updates (master, etc).
> if (AsyncProcess.this.connection.getStatisticsTracker() != null) {
>   result = ResultStatsUtil.updateStats(result,
>   AsyncProcess.this.connection.getStatisticsTracker(), server, regionName);
> }
> {code}
> It seems that {{StatsTrackingRpcRetryingCaller}} is NOT necessary,  remove it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14699) Replication crashes regionservers when hbase.wal.provider is set to multiwal

2015-10-27 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977852#comment-14977852
 ] 

Yu Li commented on HBASE-14699:
---

Hi [~ashu210890],
The issue of ReplicationManager#cleanOldLogs is already found and addressed by 
HBASE-6617 (refer to [this 
comment|https://issues.apache.org/jira/browse/HBASE-6617?focusedCommentId=14708924&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14708924]).
 HBASE-6617 is already integrated into branch-1 and master, but not branch-1.2. 
I believe branch-1 won't have such issue, you could give it a try if possible.

[~busbey], feel free to let me know if would like to take HBASE-6617 into 
branch-1.2, I could make a quick patch although there may be some rebase work 
:-)

> Replication crashes regionservers when hbase.wal.provider is set to multiwal
> 
>
> Key: HBASE-14699
> URL: https://issues.apache.org/jira/browse/HBASE-14699
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Blocker
>
> When the hbase.wal.provider is set to multiwal and replication is enabled, 
> the regionservers start crashing with the following exception:
> {code}
> ,16020,1445495411258: Failed to write replication wal position 
> (filename=%2C16020%2C1445495411258.null0.1445495898373, 
> position=1322399)
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/,16020,1445495411258/1/%2C16020%2C1445495411258.null0.1445495898373
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
>   at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:429)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:940)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:990)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:984)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.setLogPosition(ReplicationQueuesZKImpl.java:129)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:177)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:388)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977841#comment-14977841
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
What if the regionserver crashed before flushing HFile? I think the record will 
come back since it has already been persisted in WAL.
{quote}
Indeed, as current logic, when sync failed, the memstore will rollback,  and 
client will be told 'write failed'.  
And if RS crash before memstore flush,  the record will came back after WAL 
replayed.  So client will found write action is not failed, it is inconsistent!

{quote}
Add a marker maybe a solution, but you need to check the marker everywhere when 
replaying WAL, and you still need to deal with the failure when placing 
marker... 
{quote}
Agreed!

Thanks [~Apache9] for your reply!
 

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14529) Respond to SIGHUP to reload config

2015-10-27 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977839#comment-14977839
 ] 

ramkrishna.s.vasudevan commented on HBASE-14529:


Pushed the addendum to master, 1.2 and 1.3. Thanks for the addendum [~ashish 
singhi] and thanks [~eclark] for the review. Sorry for being late on 
committing. 

> Respond to SIGHUP to reload config
> --
>
> Key: HBASE-14529
> URL: https://issues.apache.org/jira/browse/HBASE-14529
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14529-addendum.patch, HBASE-14529-v1.patch, 
> HBASE-14529-v2.patch, HBASE-14529.patch
>
>
> SIGHUP is the way everyone since the dawn of unix has done config reload.
> Lets not be a special unique snowflake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977832#comment-14977832
 ] 

Hudson commented on HBASE-14709:


FAILURE: Integrated in HBase-0.98 #1170 (See 
[https://builds.apache.org/job/HBase-0.98/1170/])
HBASE-14709 Parent change breaks graceful_stop.sh on a cluster (stack: rev 
5e22512cfdcf6cdf84c5041d1c2aaf50cf169a9f)
* bin/graceful_stop.sh


> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977831#comment-14977831
 ] 

Hudson commented on HBASE-14674:


FAILURE: Integrated in HBase-0.98 #1170 (See 
[https://builds.apache.org/job/HBase-0.98/1170/])
HBASE-14674 Rpc handler / task monitoring seems to be broken after 0.98 (enis: 
rev 7bef3b196b589a13cdc2f89560bef7f7d154ba5f)
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java


> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977830#comment-14977830
 ] 

Hudson commented on HBASE-14705:


FAILURE: Integrated in HBase-0.98 #1170 (See 
[https://builds.apache.org/job/HBase-0.98/1170/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev fd18723e37e358a5287575feab84da258f074337)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977829#comment-14977829
 ] 

Hudson commented on HBASE-14680:


FAILURE: Integrated in HBase-0.98 #1170 (See 
[https://builds.apache.org/job/HBase-0.98/1170/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 82464eacb837f1f74d3937da6e291a7add4c8c3a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14425) In Secure Zookeeper cluster superuser will not have sufficient permission if multiple values are configured in "hbase.superuser"

2015-10-27 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977826#comment-14977826
 ] 

Pankaj Kumar commented on HBASE-14425:
--

Thanks [~enis] for reviewing and committing the patch :)

> In Secure Zookeeper cluster superuser will not have sufficient permission if 
> multiple values are configured in "hbase.superuser"
> 
>
> Key: HBASE-14425
> URL: https://issues.apache.org/jira/browse/HBASE-14425
> Project: HBase
>  Issue Type: Bug
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14425-V2.patch, HBASE-14425-V2.patch, 
> HBASE-14425.patch
>
>
> During master intialization we are setting ACLs for the znodes.
> In ZKUtil.createACL(ZooKeeperWatcher zkw, String node, boolean 
> isSecureZooKeeper),
> {code}
>   String superUser = zkw.getConfiguration().get("hbase.superuser");
>   ArrayList acls = new ArrayList();
>   // add permission to hbase supper user
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> Here we are directly setting "hbase.superuser" value to Znode which will 
> cause an issue when multiple values are configured. In "hbase.superuser" 
> multiple superusers and supergroups can be configured separated by comma. We 
> need to iterate them and set ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14511) StoreFile.Writer Meta Plugin

2015-10-27 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977791#comment-14977791
 ] 

Lars Hofhansl commented on HBASE-14511:
---

I can also see a new filter API: As an optional optimization a filter can be 
passed an HFile meta block (or whether abstraction is useful) and then decide 
to filter the entire file). I.e. the Meta Data here is only useful if one can 
act on it in a meaningful way.


> StoreFile.Writer Meta Plugin
> 
>
> Key: HBASE-14511
> URL: https://issues.apache.org/jira/browse/HBASE-14511
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-14511-v3.patch, HBASE-14511.v1.patch, 
> HBASE-14511.v2.patch
>
>
> During my work on a new compaction policies (HBASE-14468, HBASE-14477) I had 
> to modify the existing code of a StoreFile.Writer to add additional meta-info 
> required by these new  policies. I think that it should be done by means of a 
> new Plugin framework, because this seems to be a general capability/feature. 
> As a future enhancement this can become a part of a more general 
> StoreFileWriter/Reader plugin architecture. But I need only Meta section of a 
> store file.
> This could be used, for example, to collect rowkeys distribution information 
> during hfile creation. This info can be used later to find the optimal region 
> split key or to create optimal set of sub-regions for M/R jobs or other jobs 
> which can operate on a sub-region level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-10-27 Thread Eshcar Hillel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977783#comment-14977783
 ] 

Eshcar Hillel commented on HBASE-13408:
---

The attached patch fixes the tests failures and adds support for setting 
compacted memstore through HColumnDescriptor methods:
String getMemStoreClassName()
HColumnDescriptor setMemStoreClass(String className)

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977769#comment-14977769
 ] 

Duo Zhang commented on HBASE-14004:
---

The problem here not only effects replication.

{quote}
As a result, the handler will rollback the Memstore and the later flushed HFile 
will also skip this record.
{quote}

What if the regionserver crashed before flushing HFile? I think the record will 
come back since it has already been persisted in WAL.

Add a marker maybe a solution, but you need to check the marker everywhere when 
replaying WAL, and you still need to deal with the failure when placing 
marker... I do not think it is easy to do...

The basic problem here is we may have inconsistency between memstore and WAL 
when we fail to sync WAL.
A simple solution is killing the regionserver when we fail to sync WAL which 
means we will never rollback memstore but reconstruct it using WAL. We can make 
sure there is no difference between memstore and WAL under this situation.
If we want to keep regionserver alive when syncing failed, then I think we need 
to find the real result of the sync operation. Maybe we could close the WAL 
file and check its length? Of course, if we have lost the connection to 
namenode, I think there is no simple solution other than killing the 
regionserver...

Thanks.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14660) AssertionError found when using offheap BucketCache with assertion enabled

2015-10-27 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977768#comment-14977768
 ] 

ramkrishna.s.vasudevan commented on HBASE-14660:


Can I get some +1s to commit this?

> AssertionError found when using offheap BucketCache with assertion enabled
> --
>
> Key: HBASE-14660
> URL: https://issues.apache.org/jira/browse/HBASE-14660
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Yu Li
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-14660.patch, HBASE-14660_1.patch, 
> HBASE-14660_2.patch, HBASE-14660_2.patch, HBASE-14660_3.patch
>
>
> During perf verification of HBASE-14463, found offheap BucketCache not 
> working with assertion enabled in hbase-env.sh:
> {noformat}
> export HBASE_OPTS="-ea -XX:+HeapDumpOnOutOfMemoryError 
> -XX:+UseConcMarkSweepGC"
> {noformat}
> And the error when running PE tool is like:
> {noformat}
> 15/10/21 16:06:34 INFO client.AsyncProcess: #10, table=TestTable, 
> attempt=10/21 failed=20ops, last exception: java.io.IOException: 
> java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2181)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.AssertionError
> at 
> org.apache.hadoop.hbase.OffheapKeyValue.(OffheapKeyValue.java:52)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl$ShareableMemoryOffheapKeyValue.(HFileReaderImpl.java:1003)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.getCell(HFileReaderImpl.java:949)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:201)
> at 
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:323)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:279)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:813)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:641)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:153)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5649)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5795)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5568)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5544)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5530)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2044)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:663)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2156)
> {noformat}
> [~ram_krish] and [~anoop.hbase], mind to take a look?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-10-27 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977758#comment-14977758
 ] 

Anoop Sam John commented on HBASE-14355:


Ya no need to worry abt line length issue in PB generated files.  Actually we 
avoid line length checks for generated files in our pre commit build script.  
Seems that is broken?

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.16
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v2.patch, 
> HBASE-14355-v3.patch, HBASE-14355-v4.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14355) Scan different TimeRange for each column family

2015-10-27 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977755#comment-14977755
 ] 

Anoop Sam John edited comment on HBASE-14355 at 10/28/15 5:23 AM:
--

Some comments
- The Map for cf vs TimeRange is repeated in Scan and Get.. We can keep it in 
Query. Add setter and getter there.  We need overloaded setters definition in 
Scan and Get.(For eg: see Query#setFilter and Scan#setFilter etc)
- In copy constructors of Get and Scan, just copying the reference. This should 
be avoided.  See how we handle family map and recreate it for the new Object
- Need to handle it in constructor Scan(Get get)
- {quote}byte[] cf = Bytes.toBytes(store.getColumnFamilyName());{quote} You 
can see call to getColumnFamilyName will convert the byte[] name into a string 
and here we will convert it again to byte[].  Unwanted ops. U can use 
store.getFamily ().getName ()


was (Author: anoop.hbase):
Some comments
- The Map for cf vs TimeRange is repeated in Scan and Get.. We can keep it in 
Query. Add setter and getter there.  We need overloaded setters definition in 
Scan and Get.(For eg: see Query#setFilter and Scan#setFilter etc)
- In copy constructors of Get and Scan, just copying the reference. This should 
be avoided.  See how we handle family map and recreate it for the new Object
- Need to handle it in constructor Scan(Get get)
- bq.byte[] cf = Bytes.toBytes(store.getColumnFamilyName()); You can see 
call to getColumnFamilyName will convert the byte[] name into a string and here 
we will convert it again to byte[].  Unwanted ops. U can use store.getFamily 
().getName ()

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.16
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v2.patch, 
> HBASE-14355-v3.patch, HBASE-14355-v4.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-10-27 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977755#comment-14977755
 ] 

Anoop Sam John commented on HBASE-14355:


Some comments
- The Map for cf vs TimeRange is repeated in Scan and Get.. We can keep it in 
Query. Add setter and getter there.  We need overloaded setters definition in 
Scan and Get.(For eg: see Query#setFilter and Scan#setFilter etc)
- In copy constructors of Get and Scan, just copying the reference. This should 
be avoided.  See how we handle family map and recreate it for the new Object
- Need to handle it in constructor Scan(Get get)
- bq.byte[] cf = Bytes.toBytes(store.getColumnFamilyName()); You can see 
call to getColumnFamilyName will convert the byte[] name into a string and here 
we will convert it again to byte[].  Unwanted ops. U can use store.getFamily 
().getName ()

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.16
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v2.patch, 
> HBASE-14355-v3.patch, HBASE-14355-v4.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients

2015-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977746#comment-14977746
 ] 

Hadoop QA commented on HBASE-14700:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769148/HBASE-14700.patch
  against master branch at commit e24d03b10c34cca4e51d037ae51fef4eca1666de.
  ATTACHMENT ID: 12769148

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1733 checkstyle errors (more than the master's current 1732 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16254//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16254//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16254//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16254//console

This message is automatically generated.

> Support a "permissive" mode for secure clusters to allow "simple" auth clients
> --
>
> Key: HBASE-14700
> URL: https://issues.apache.org/jira/browse/HBASE-14700
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 2.0.0
>
> Attachments: HBASE-14700.patch
>
>
> When implementing HBase security for an existing cluster, it can be useful to 
> support mixed secure and insecure clients while all client configurations are 
> migrated over to secure authentication.  
> We currently have an option to allow secure clients to fallback to simple 
> auth against insecure clusters.  By providing an analogous setting for 
> servers, we would allow a phased rollout of security:
> # First, security can be enabled on the cluster servers, with the 
> "permissive" mode enabled
> # Clients can be converting to using secure authentication incrementally
> # The server audit logs allow identification of clients still using simple 
> auth to connect
> # Finally, when sufficient clients have been converted to secure operation, 
> the server-side "permissive" mode can be removed, allowing completely secure 
> operation.
> Obviously with this enabled, there is no effective access control, but this 
> would still be a useful tool to enable a smooth operational rollout of 
> security.  Permissive mode would of course be disabled by default.  Enabling 
> it should provide a big scary warning in the logs on startup, and possibly be 
> flagged on relevant UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed

2015-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977736#comment-14977736
 ] 

Hadoop QA commented on HBASE-12769:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769179/12769-v6.txt
  against master branch at commit 210c3dd93748b5de65301f2cca2342f36e169b78.
  ATTACHMENT ID: 12769179

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16256//console

This message is automatically generated.

> Replication fails to delete all corresponding zk nodes when peer is removed
> ---
>
> Key: HBASE-12769
> URL: https://issues.apache.org/jira/browse/HBASE-12769
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 0.99.2
>Reporter: Jianwei Cui
>Assignee: Jianwei Cui
>Priority: Minor
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, 12769-v5.txt, 
> 12769-v6.txt, HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch
>
>
> When removing a peer, the client side will delete peerId under peersZNode 
> node; then alive region servers will be notified and delete corresponding 
> hlog queues under its rsZNode of replication. However, if there are failed 
> servers whose hlog queues have not been transferred by alive servers(this 
> likely happens if setting a big value to "replication.sleep.before.failover" 
> and lots of region servers restarted), these hlog queues won't be deleted 
> after the peer is removed. I think remove_peer should guarantee all 
> corresponding zk nodes have been removed after it completes; otherwise, if we 
> create a new peer with the same peerId with the removed one, there might be 
> unexpected data to be replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14675) Exorcise deprecated Put#add(...) and replace with Put#addColumn(...)

2015-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977732#comment-14977732
 ] 

Hadoop QA commented on HBASE-14675:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769163/hbase-14675.v3.patch
  against master branch at commit e24d03b10c34cca4e51d037ae51fef4eca1666de.
  ATTACHMENT ID: 12769163

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 514 
new or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+table.put(new Put(Bytes.toBytes("k")).addColumn(family, 
Bytes.toBytes("q"), Bytes.toBytes("v")));
+  table.put(new Put(Bytes.toBytes("testrow")).addColumn(hcd.getName(), 
Bytes.toBytes("q"), Bytes.toBytes("value")));
+  table.put(new Put(Bytes.toBytes("testrow")).addColumn(hcd.getName(), 
Bytes.toBytes("q"), Bytes.toBytes("value")));
+region.put(new Put(row).addColumn(fam, Bytes.toBytes("qual"), 
System.currentTimeMillis() + 2000, Bytes.toBytes("value")));
+  put.addColumn(families.get(i % families.size()).getName(), 
Bytes.toBytes("q"), Bytes.toBytes("val"));
+  put.addColumn(families.get(i % families.size()).getName(), 
Bytes.toBytes("q"), Bytes.toBytes("val"));
+  put.addColumn(CellUtil.cloneFamily(firstVal), 
CellUtil.cloneQualifier(firstVal), Bytes.toBytes("diff data"));
+t.put(new 
Put(TEST_ROW).addColumn(AccessControlLists.ACL_LIST_FAMILY, TEST_QUALIFIER, 
TEST_VALUE));
+p = new Put(TEST_ROW).addColumn(TEST_FAMILY1, TEST_Q1, 
EnvironmentEdgeManager.currentTime() + 100, ZERO);
+  p = new Put(TEST_ROW).addColumn(TEST_FAMILY, TEST_Q3, 
ZERO).addColumn(TEST_FAMILY, TEST_Q4, ZERO);

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16255//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16255//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16255//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16255//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16255//console

This message is automatically generated.

> Exorcise deprecated Put#add(...) and replace with Put#addColumn(...)
> 
>
> Key: HBASE-14675
> URL: https://issues.apache.org/jira/browse/HBASE-14675
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 2.0.0
>
> Attachments: hbase-14675.patch, hbase-14675.patch, 
> hbase-14675.v2.patch, hbase-14675.v3.patch
>
>
> The Put API changed from #add(...) to #addColumn(...).  This updates all 
> instances of it and removes it from the Put (which was added for hbase 1.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14708) Use copy on write TreeMap for region location cache

2015-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977730#comment-14977730
 ] 

Hadoop QA commented on HBASE-14708:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769165/HBASE-14708-v4.patch
  against master branch at commit 0e6dd3257b1bebe3e12c84aace59dd9cf0dcac2b.
  ATTACHMENT ID: 12769165

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 release 
audit warnings (more than the master's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRS
  org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster
  
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
  org.apache.hadoop.hbase.client.TestSizeFailures
  
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
  
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
  org.apache.hadoop.hbase.snapshot.TestFlushSnapshotFromClient
  org.apache.hadoop.hbase.client.TestHCM
  
org.apache.hadoop.hbase.replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleWAL
  
org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure
  org.apache.hadoop.hbase.util.TestHBaseFsckOneRS
  
org.apache.hadoop.hbase.replication.multiwal.TestReplicationSyncUpToolWithMultipleWAL
  
org.apache.hadoop.hbase.mapreduce.TestSecureLoadIncrementalHFilesSplitRecovery
  org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache
  
org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure
  org.apache.hadoop.hbase.replication.TestReplicationSyncUpTool
  org.apache.hadoop.hbase.TestMultiVersions
  org.apache.hadoop.hbase.client.TestFromClientSide

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16253//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16253//artifact/patchprocess/patchReleaseAuditWarnings.txt
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16253//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16253//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16253//console

This message is automatically generated.

> Use copy on write TreeMap for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v2.patch, HBASE-14708-v3.patch, 
> HBASE-14708-v4.patch, HBASE-14708.patch, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 

[jira] [Updated] (HBASE-14704) Block cache instances should be mapped to each region servers in standalone mode or HBaseTestingUtility

2015-10-27 Thread Eungsop Yoo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eungsop Yoo updated HBASE-14704:

Status: Patch Available  (was: Open)

> Block cache instances should be mapped to each region servers in standalone 
> mode or HBaseTestingUtility
> ---
>
> Key: HBASE-14704
> URL: https://issues.apache.org/jira/browse/HBASE-14704
> Project: HBase
>  Issue Type: Bug
>Reporter: Eungsop Yoo
>Priority: Minor
> Attachments: HBASE-14704.patch
>
>
> CacheConfig has a single and static block cache instance. When HBase is 
> running in standalone mode or HBaseTestingUtility, a single instance of block 
> cache causes incorrect cache stats. So I suggest to change a single instance 
> of block cache to a map of region server and block cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13014) Java Tool For Region Moving

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977698#comment-14977698
 ] 

Hudson commented on HBASE-13014:


SUCCESS: Integrated in HBase-TRUNK #6970 (See 
[https://builds.apache.org/job/HBase-TRUNK/6970/])
HBASE-13014 Java Tool For Region Moving (Abhishek Singh Chouhan) (apurtell: rev 
939697b415201348ff4523321e316dfaf2206630)
* hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionMover.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java


> Java Tool For Region Moving 
> 
>
> Key: HBASE-13014
> URL: https://issues.apache.org/jira/browse/HBASE-13014
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0
>
> Attachments: HBASE-13014-master-v2.patch, HBASE-13014-master.patch, 
> HBASE-13014-v2.patch, HBASE-13014-v3.patch, HBASE-13014-v4.patch, 
> HBASE-13014-v5.patch, HBASE-13014-v6.patch, HBASE-13014.patch
>
>
> As per discussion on HBASE-12989 we should move the functionality of 
> region_mover.rb into a Java tool and use region_mover.rb only only as a 
> wrapper around it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977699#comment-14977699
 ] 

Hudson commented on HBASE-14709:


SUCCESS: Integrated in HBase-TRUNK #6970 (See 
[https://builds.apache.org/job/HBase-TRUNK/6970/])
HBASE-14709 Parent change breaks graceful_stop.sh on a cluster (stack: rev 
007e4dfa1384f9746174442a01f512a5744a83da)
* bin/graceful_stop.sh


> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14425) In Secure Zookeeper cluster superuser will not have sufficient permission if multiple values are configured in "hbase.superuser"

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977697#comment-14977697
 ] 

Hudson commented on HBASE-14425:


SUCCESS: Integrated in HBase-TRUNK #6970 (See 
[https://builds.apache.org/job/HBase-TRUNK/6970/])
HBASE-14425 In Secure Zookeeper cluster superuser will not have (enis: rev 
0e6dd3257b1bebe3e12c84aace59dd9cf0dcac2b)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKUtil.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestZKAndFSPermissions.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java


> In Secure Zookeeper cluster superuser will not have sufficient permission if 
> multiple values are configured in "hbase.superuser"
> 
>
> Key: HBASE-14425
> URL: https://issues.apache.org/jira/browse/HBASE-14425
> Project: HBase
>  Issue Type: Bug
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14425-V2.patch, HBASE-14425-V2.patch, 
> HBASE-14425.patch
>
>
> During master intialization we are setting ACLs for the znodes.
> In ZKUtil.createACL(ZooKeeperWatcher zkw, String node, boolean 
> isSecureZooKeeper),
> {code}
>   String superUser = zkw.getConfiguration().get("hbase.superuser");
>   ArrayList acls = new ArrayList();
>   // add permission to hbase supper user
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> Here we are directly setting "hbase.superuser" value to Znode which will 
> cause an issue when multiple values are configured. In "hbase.superuser" 
> multiple superusers and supergroups can be configured separated by comma. We 
> need to iterate them and set ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977696#comment-14977696
 ] 

Heng Chen commented on HBASE-14004:
---

{quote}
When WAL sync failed, and master has to rollback Memstore. 
We can record this action in ZK or system table, meanwhile all slaves should 
sync this action and modify its memstore.
{quote}

Of course, before slave rollback memstore, it should check the timestamp of 
rollback action and WALs to replay.

> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin

2015-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977691#comment-14977691
 ] 

Heng Chen commented on HBASE-14004:
---

I have a proposal.
When WAL sync failed, and master has to rollback Memstore.  
We can record this action in ZK or system table,  meanwhile all slaves should 
sync this action and modify its memstore.

Any concerns? 



> [Replication] Inconsistency between Memstore and WAL may result in data in 
> remote cluster that is not in the origin
> ---
>
> Key: HBASE-14004
> URL: https://issues.apache.org/jira/browse/HBASE-14004
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: He Liangliang
>Priority: Critical
>  Labels: replication, wal
>
> Looks like the current write path can cause inconsistency between 
> memstore/hfile and WAL which cause the slave cluster has more data than the 
> master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  
> (may partially) transported to the DNs which finally get persisted. As a 
> result, the handler will rollback the Memstore and the later flushed HFile 
> will also skip this record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977683#comment-14977683
 ] 

Hudson commented on HBASE-14674:


SUCCESS: Integrated in HBase-1.2 #314 (See 
[https://builds.apache.org/job/HBase-1.2/314/])
HBASE-14674 Rpc handler / task monitoring seems to be broken after 0.98 (enis: 
rev 5faf604c0b0d3d2a9598cf565c5b45b35458fd46)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java


> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977681#comment-14977681
 ] 

Hudson commented on HBASE-14680:


SUCCESS: Integrated in HBase-1.2 #314 (See 
[https://builds.apache.org/job/HBase-1.2/314/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 688b772375c5cf0ee1d7c6694611513902adad3b)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977682#comment-14977682
 ] 

Hudson commented on HBASE-14705:


SUCCESS: Integrated in HBase-1.2 #314 (See 
[https://builds.apache.org/job/HBase-1.2/314/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev 2b54a354171806c53c91dfe53b0ba628a15cecc0)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977676#comment-14977676
 ] 

Hudson commented on HBASE-14705:


SUCCESS: Integrated in HBase-1.3-IT #277 (See 
[https://builds.apache.org/job/HBase-1.3-IT/277/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev 53a8ce5fabc10c17905053330fc287ab7876a5b9)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977678#comment-14977678
 ] 

Hudson commented on HBASE-14709:


SUCCESS: Integrated in HBase-1.3-IT #277 (See 
[https://builds.apache.org/job/HBase-1.3-IT/277/])
HBASE-14709 Parent change breaks graceful_stop.sh on a cluster (stack: rev 
d7c1468ed93b9d0d8788de346f2c72f292fe6c95)
* bin/graceful_stop.sh


> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977674#comment-14977674
 ] 

Hudson commented on HBASE-14680:


SUCCESS: Integrated in HBase-1.3-IT #277 (See 
[https://builds.apache.org/job/HBase-1.3-IT/277/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 87c97c231a9fdd9e3d86eb86c234ab46f58d02b6)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977677#comment-14977677
 ] 

Hudson commented on HBASE-14674:


SUCCESS: Integrated in HBase-1.3-IT #277 (See 
[https://builds.apache.org/job/HBase-1.3-IT/277/])
HBASE-14674 Rpc handler / task monitoring seems to be broken after 0.98 (enis: 
rev 68f0fff281a7da6935fd328b9a692b77c6f559c3)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14425) In Secure Zookeeper cluster superuser will not have sufficient permission if multiple values are configured in "hbase.superuser"

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977675#comment-14977675
 ] 

Hudson commented on HBASE-14425:


SUCCESS: Integrated in HBase-1.3-IT #277 (See 
[https://builds.apache.org/job/HBase-1.3-IT/277/])
HBASE-14425 In Secure Zookeeper cluster superuser will not have (enis: rev 
c174a54d87e14edd8c7cf039fecd4ce40066521b)
* hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKUtil.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestZKAndFSPermissions.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java


> In Secure Zookeeper cluster superuser will not have sufficient permission if 
> multiple values are configured in "hbase.superuser"
> 
>
> Key: HBASE-14425
> URL: https://issues.apache.org/jira/browse/HBASE-14425
> Project: HBase
>  Issue Type: Bug
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14425-V2.patch, HBASE-14425-V2.patch, 
> HBASE-14425.patch
>
>
> During master intialization we are setting ACLs for the znodes.
> In ZKUtil.createACL(ZooKeeperWatcher zkw, String node, boolean 
> isSecureZooKeeper),
> {code}
>   String superUser = zkw.getConfiguration().get("hbase.superuser");
>   ArrayList acls = new ArrayList();
>   // add permission to hbase supper user
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> Here we are directly setting "hbase.superuser" value to Znode which will 
> cause an issue when multiple values are configured. In "hbase.superuser" 
> multiple superusers and supergroups can be configured separated by comma. We 
> need to iterate them and set ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977659#comment-14977659
 ] 

Hudson commented on HBASE-14705:


SUCCESS: Integrated in HBase-1.2-IT #247 (See 
[https://builds.apache.org/job/HBase-1.2-IT/247/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev 2b54a354171806c53c91dfe53b0ba628a15cecc0)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977660#comment-14977660
 ] 

Hudson commented on HBASE-14674:


SUCCESS: Integrated in HBase-1.2-IT #247 (See 
[https://builds.apache.org/job/HBase-1.2-IT/247/])
HBASE-14674 Rpc handler / task monitoring seems to be broken after 0.98 (enis: 
rev 5faf604c0b0d3d2a9598cf565c5b45b35458fd46)
* hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java


> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977661#comment-14977661
 ] 

Hudson commented on HBASE-14709:


SUCCESS: Integrated in HBase-1.2-IT #247 (See 
[https://builds.apache.org/job/HBase-1.2-IT/247/])
HBASE-14709 Parent change breaks graceful_stop.sh on a cluster (stack: rev 
6bd8bf1e23175fe731bedffa98bf6805e8e412da)
* bin/graceful_stop.sh


> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14425) In Secure Zookeeper cluster superuser will not have sufficient permission if multiple values are configured in "hbase.superuser"

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977658#comment-14977658
 ] 

Hudson commented on HBASE-14425:


SUCCESS: Integrated in HBase-1.2-IT #247 (See 
[https://builds.apache.org/job/HBase-1.2-IT/247/])
HBASE-14425 In Secure Zookeeper cluster superuser will not have (enis: rev 
4a9984da439152ccfb967f1f204b49029c8f6324)
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestZKAndFSPermissions.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKUtil.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java


> In Secure Zookeeper cluster superuser will not have sufficient permission if 
> multiple values are configured in "hbase.superuser"
> 
>
> Key: HBASE-14425
> URL: https://issues.apache.org/jira/browse/HBASE-14425
> Project: HBase
>  Issue Type: Bug
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14425-V2.patch, HBASE-14425-V2.patch, 
> HBASE-14425.patch
>
>
> During master intialization we are setting ACLs for the znodes.
> In ZKUtil.createACL(ZooKeeperWatcher zkw, String node, boolean 
> isSecureZooKeeper),
> {code}
>   String superUser = zkw.getConfiguration().get("hbase.superuser");
>   ArrayList acls = new ArrayList();
>   // add permission to hbase supper user
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> Here we are directly setting "hbase.superuser" value to Znode which will 
> cause an issue when multiple values are configured. In "hbase.superuser" 
> multiple superusers and supergroups can be configured separated by comma. We 
> need to iterate them and set ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977657#comment-14977657
 ] 

Hudson commented on HBASE-14680:


SUCCESS: Integrated in HBase-1.2-IT #247 (See 
[https://builds.apache.org/job/HBase-1.2-IT/247/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 688b772375c5cf0ee1d7c6694611513902adad3b)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-27 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977632#comment-14977632
 ] 

Enis Soztutar commented on HBASE-14468:
---

This is a good idea. We should add this to the list of compaction policies with 
good documentation. We have use cases where there is a TTL of a couple of days. 
Metrics store is one such example for the raw data in a high ingest scenario. 

For the patch itself, the first if is not needed if we are checking for the 
DisabledRSP anyway: 
{code}
+
if(splitPolicyClassName.equals(IncreasingToUpperBoundRegionSplitPolicy.class.getName())){
+  throw new RuntimeException("Default split policy for FIFO compaction"+
+  " is not supported, aborting.");
+} else if( 
!splitPolicyClassName.equals(DisabledRegionSplitPolicy.class.getName())){
+  warn.append(":region splits must be disabled:");
+} 
{code}

Can we make it so that if a split happens we still compact the reference files, 
but we do not compact otherwise? We can also allow very-slow splits in the case 
where the reference files will be cleaned out due to TTL. In this case, a 
region can still split every TTL interval. 

RuntimeException's thrown will cause region opening to fail or RS to abort? Can 
we hook the verify code to {{HMaster.sanityCheckTableDescriptor()}}, so that 
you cannot alter the table or create a table with those settings. This will 
make a much better experience for the user. 

Can we also simplify the configuration for these. Maybe we auto-disable the 
major compactions, and set the blocking store files if they are not set? 

Can we use HStore.removeUnneededFiles() or 
{{storeEngine.getStoreFileManager()}} which already implements the is expired 
logic so that there is no duplication there? 

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977622#comment-14977622
 ] 

Hadoop QA commented on HBASE-13408:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12769116/HBASE-13408-trunk-v08.patch
  against master branch at commit 0e6dd3257b1bebe3e12c84aace59dd9cf0dcac2b.
  ATTACHMENT ID: 12769116

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 74 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 5 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1735 checkstyle errors (more than the master's current 1732 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+ * in a single read-only component. The “old” components are discarded 
when no scanner is reading
+  boolean isReadOnly, Durability durability, WAL wal, boolean[] 
compactedMemStore, byte[]... families)
+   * {@link HBaseTestingUtility#createWal(Configuration, Path, 
org.apache.hadoop.hbase.HRegionInfo)} because that method
+  public static final TableName TABLENAME = 
TableName.valueOf("TestWalAndCompactedMemstoreFlush", "t1");
++ ", CompactedMemStore DEEP_OVERHEAD_PER_PIPELINE_ITEM is:" + 
CompactedMemStore.DEEP_OVERHEAD_PER_PIPELINE_ITEM
++ ", the smallest sequence in CF1:" + smallestSeqCF1PhaseII + ", the 
smallest sequence in CF2:"
++ ", the smallest sequence in CF1:" + smallestSeqCF1PhaseIV + ", the 
smallest sequence in CF2:"
++ ". After additional inserts and last flush, the entire region size 
is:" + region.getMemstoreSize()
++ ", the smallest sequence in CF1:" + smallestSeqCF1PhaseIII + ", the 
smallest sequence in CF2:"
++ ", the smallest sequence in CF1:" + smallestSeqCF1PhaseIV + ", the 
smallest sequence in CF2:"

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16252//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16252//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16252//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16252//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16252//console

This message is automatically generated.

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate e

[jira] [Commented] (HBASE-14687) Un-synchronize BufferedMutator

2015-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977620#comment-14977620
 ] 

Hadoop QA commented on HBASE-14687:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769105/HBASE-14687-v3.patch
  against master branch at commit 0e6dd3257b1bebe3e12c84aace59dd9cf0dcac2b.
  ATTACHMENT ID: 12769105

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1734 checkstyle errors (more than the master's current 1732 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16251//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16251//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16251//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16251//console

This message is automatically generated.

> Un-synchronize BufferedMutator
> --
>
> Key: HBASE-14687
> URL: https://issues.apache.org/jira/browse/HBASE-14687
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Attachments: HBASE-14687-v1.patch, HBASE-14687-v2.patch, 
> HBASE-14687-v3.patch, HBASE-14687.patch
>
>
> It should totally be possible to make BufferedMutatorImpl not use much 
> locking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14678) Experiment: Temporarily disable balancer and a few others to see if root of crashed/timedout JVMs

2015-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977618#comment-14977618
 ] 

Heng Chen commented on HBASE-14678:
---

{quote}
Hanging test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
{quote}

I notice this failed is after revert HBASE-14684 in branch-1. 
IMO we could remove MiniMRCluster in {{TestHFileOutputFormat}} at least.

> Experiment: Temporarily disable balancer and a few others to see if root of 
> crashed/timedout JVMs
> -
>
> Key: HBASE-14678
> URL: https://issues.apache.org/jira/browse/HBASE-14678
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>
> Looking at recent builds of 1.2, I see a few of the runs finishing with kills 
> and notice that a JVM exited without reporting back state. Running the 
> hanging test finder, I can see at least that in one case that the balancer 
> tests seem to be outstanding; looking in test output, seems to be still going 
> on A few others are reported as hung but they look like they have just 
> started running and are just killed by surefire.
> This issue is about trying to disable a few of the problematics like balancer 
> tests to see if our overall stability improves. If so, I can concentrate on 
> stabilizing these few tests. Else will just undo the experiment and put the 
> tests back on line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed

2015-10-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-12769:
---
 Hadoop Flags: Reviewed
Fix Version/s: 1.3.0
   2.0.0

> Replication fails to delete all corresponding zk nodes when peer is removed
> ---
>
> Key: HBASE-12769
> URL: https://issues.apache.org/jira/browse/HBASE-12769
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 0.99.2
>Reporter: Jianwei Cui
>Assignee: Jianwei Cui
>Priority: Minor
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, 12769-v5.txt, 
> 12769-v6.txt, HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch
>
>
> When removing a peer, the client side will delete peerId under peersZNode 
> node; then alive region servers will be notified and delete corresponding 
> hlog queues under its rsZNode of replication. However, if there are failed 
> servers whose hlog queues have not been transferred by alive servers(this 
> likely happens if setting a big value to "replication.sleep.before.failover" 
> and lots of region servers restarted), these hlog queues won't be deleted 
> after the peer is removed. I think remove_peer should guarantee all 
> corresponding zk nodes have been removed after it completes; otherwise, if we 
> create a new peer with the same peerId with the removed one, there might be 
> unexpected data to be replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14689) Addendum and unit test for HBASE-13471

2015-10-27 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14689:
--
Attachment: hbase-14689_v1-branch-1.1.patch

Reattach

> Addendum and unit test for HBASE-13471
> --
>
> Key: HBASE-14689
> URL: https://issues.apache.org/jira/browse/HBASE-14689
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: hbase-14689_v1-branch-1.1.patch, 
> hbase-14689_v1-branch-1.1.patch, hbase-14689_v1.patch
>
>
> One of our customers ran into HBASE-13471, which resulted in all the handlers 
> getting blocked and various other issues. While backporting the issue, I 
> noticed that there is one more case where we might go into infinite loop. In 
> case a row lock cannot be acquired (due to a previous leak for example which 
> we have seen in Phoenix before) this will cause similar infinite loop. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-10-27 Thread churro morales (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977596#comment-14977596
 ] 

churro morales commented on HBASE-14355:


[~stack] The tests all pass, what do you think about the latest patch?  I 
generate the protobuf code with the maven target which is causing the 
lineLength issue, any ideas on how to fix or do we just ignore this for 
protobuf auto generated code.  If you are happy with this patch I'll get one 
for branch-1 and 98 up.  Thanks for the review. 

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.16
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v2.patch, 
> HBASE-14355-v3.patch, HBASE-14355-v4.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977594#comment-14977594
 ] 

Hudson commented on HBASE-14655:


FAILURE: Integrated in HBase-1.3 #315 (See 
[https://builds.apache.org/job/HBase-1.3/315/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
f6a30d2331f45e0fb8e13398bd56d0b9f71ed6ea)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14696) Support setting allowPartialResults in mapreduce Mappers

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977593#comment-14977593
 ] 

Hudson commented on HBASE-14696:


FAILURE: Integrated in HBase-1.3 #315 (See 
[https://builds.apache.org/job/HBase-1.3/315/])
HBASE-14696 Support setting allowPartialResults in mapreduce Mappers (tedyu: 
rev 8fc9c2803f1a27cde6b6ee5906bb7289410e6e86)
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* hbase-protocol/src/main/protobuf/Client.proto


> Support setting allowPartialResults in mapreduce Mappers
> 
>
> Key: HBASE-14696
> URL: https://issues.apache.org/jira/browse/HBASE-14696
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Mindaugas Kairys
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 14696-branch-1-v1.txt, 14696-branch-1-v2.txt, 
> 14696-branch-1-v2.txt, 14696-v1.txt, 14696-v2.txt
>
>
> It is currently impossible to get partial results in mapreduce mapper jobs.
> When setting setAllowPartialResults(true) for scan jobs, they still fail with 
> OOME on large rows.
> The reason is that Scan field allowPartialResults is lost during job creation:
>   1. User creates a Job and sets a scan object via 
> TableMapReduceUtil.initTableMapperJob(table_name, scanObj,...) -> which puts 
> a result of TableMapReduceUtil.convertScanToString(scanObj) to the job config.
>   2. When the job starts - method TableInputFormat.setConfig retrieves a scan 
> string from config and converts it to Scan object by calling 
> TableMapReduceUtil.convertStringToScan - which results in a Scan object with 
> a field allowPartialResults always set to false.
> I have tried to experiment and modify a TableInputFormat method setConfig() 
> by forcing all scans to allow partial results and after this all jobs 
> succeeded with no more OOME and I also noticed that mappers began to get 
> partial results (Result.isPartial()).
> My use case is very simple - I just have large rows and expect a mapper to 
> get them partially - to get same rowid several times with different key/value 
> records.
> This would allow me not to worry about implementing my own result 
> partitioning solution, which i would encounter in case the big amount of 
> result key values could be transparently returned for a single large row.
> And from the other side - if a Scan object can return several records for the 
> same rowid (partial results), perhaps the mapper should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977581#comment-14977581
 ] 

Hudson commented on HBASE-14674:


FAILURE: Integrated in HBase-1.1 #727 (See 
[https://builds.apache.org/job/HBase-1.1/727/])
HBASE-14674 Rpc handler / task monitoring seems to be broken after 0.98 (enis: 
rev b1c24e1da9d29402a08789785f4edc1ab1a188a8)
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java


> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14695) Fix some easy HTML warnings

2015-10-27 Thread Misty Stanley-Jones (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misty Stanley-Jones updated HBASE-14695:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master.

> Fix some easy HTML warnings
> ---
>
> Key: HBASE-14695
> URL: https://issues.apache.org/jira/browse/HBASE-14695
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-14695.patch
>
>
> There are a few links to top-level pages but missing the trailing /, and a 
> few please where we link to pages like replication.html even though we have a 
> single-page book now. There are also a few broken links in the APIdocs, 
> probably due to code we have specifically filtered out of the Javadoc:
> {code}
> #
> # ERROR   4 files had broken links
> #
> /devapidocs/org/apache/hadoop/hbase/wal/NamespaceGroupingStrategy.html
> had 1 broken link
> 
> /devapidocs/org/apache/hadoop/hbase/wal/RegionGroupingProvider.RegionGroupingStrategy.html
> /devapidocs/org/apache/hadoop/hbase/wal/package-tree.html
> had 1 broken link
> 
> /devapidocs/org/apache/hadoop/hbase/wal/RegionGroupingProvider.RegionGroupingStrategy.html
> /devapidocs/overview-tree.html
> had 2 broken links
> 
> /devapidocs/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.SeekerState.html
> 
> /devapidocs/org/apache/hadoop/hbase/wal/RegionGroupingProvider.RegionGroupingStrategy.html
> /devapidocs/serialized-form.html
> had 26 broken links
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.DescriptorProto.ExtensionRange.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.DescriptorProto.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.EnumDescriptorProto.html
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.EnumOptions.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.EnumValueDescriptorProto.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.EnumValueOptions.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.FieldDescriptorProto.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.FieldOptions.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.FileDescriptorProto.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.FileDescriptorSet.html
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.FileOptions.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.MessageOptions.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.MethodDescriptorProto.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.MethodOptions.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.ServiceDescriptorProto.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.ServiceOptions.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.SourceCodeInfo.Location.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.SourceCodeInfo.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.UninterpretedOption.NamePart.html
> 
> /devapidocs/src-html/com/google/protobuf/DescriptorProtos.UninterpretedOption.html
> /devapidocs/src-html/com/google/protobuf/GeneratedMessage.html
> /devapidocs/src-html/com/google/protobuf/GeneratedMessageLite.html
> /devapidocs/src-html/org/apache/hadoop/hbase/spark/ColumnFilter$.html
> /devapidocs/src-html/org/apache/hadoop/hbase/spark/HBaseContext$.html
> /devapidocs/src-html/org/apache/hadoop/hbase/spark/RowKeyFilter$.html
> 
> /devapidocs/src-html/org/apache/hadoop/hbase/spark/SchemaQualifierDefinition$.html
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977582#comment-14977582
 ] 

Hudson commented on HBASE-14709:


FAILURE: Integrated in HBase-1.1 #727 (See 
[https://builds.apache.org/job/HBase-1.1/727/])
HBASE-14709 Parent change breaks graceful_stop.sh on a cluster (stack: rev 
ef26573bdf135b8f070ad7fc47dac062e6340d7c)
* bin/graceful_stop.sh


> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977580#comment-14977580
 ] 

Hudson commented on HBASE-14705:


FAILURE: Integrated in HBase-1.1 #727 (See 
[https://builds.apache.org/job/HBase-1.1/727/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev 61d913dd0de04683f021be94a5b6f15bc903a77f)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977579#comment-14977579
 ] 

Hudson commented on HBASE-14680:


FAILURE: Integrated in HBase-1.1 #727 (See 
[https://builds.apache.org/job/HBase-1.1/727/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 28027138c3f8b6fa40e51626993e049814606ef2)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed

2015-10-27 Thread Jianwei Cui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianwei Cui updated HBASE-12769:

Attachment: 12769-v6.txt

> Replication fails to delete all corresponding zk nodes when peer is removed
> ---
>
> Key: HBASE-12769
> URL: https://issues.apache.org/jira/browse/HBASE-12769
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 0.99.2
>Reporter: Jianwei Cui
>Assignee: Jianwei Cui
>Priority: Minor
> Attachments: 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, 12769-v5.txt, 
> 12769-v6.txt, HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch
>
>
> When removing a peer, the client side will delete peerId under peersZNode 
> node; then alive region servers will be notified and delete corresponding 
> hlog queues under its rsZNode of replication. However, if there are failed 
> servers whose hlog queues have not been transferred by alive servers(this 
> likely happens if setting a big value to "replication.sleep.before.failover" 
> and lots of region servers restarted), these hlog queues won't be deleted 
> after the peer is removed. I think remove_peer should guarantee all 
> corresponding zk nodes have been removed after it completes; otherwise, if we 
> create a new peer with the same peerId with the removed one, there might be 
> unexpected data to be replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14479) Apply the Leader/Followers pattern to RpcServer's Reader

2015-10-27 Thread Hiroshi Ikeda (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977559#comment-14977559
 ] 

Hiroshi Ikeda commented on HBASE-14479:
---

I have an idea that a just simple scheduler can execute tasks within the almost 
same thread in a low load with a queue for tasks, instead of preparing an 
exclusive thread pool in RpcExecutor.

Pseudo code:
{code}
void RpcScheduler.dispatch(callRunner) {
queue.offer(callRunner);

if (threadsExecutingTasks < MAX_THREADS_EXECUTING_TASKS) {
   threadExecutingTasks++;
   while ((task = queue.poll()) != null) {
 execute(task);
   }
   // In most cases in a low load, execute the one task you have added.
   threadExecutingTasks--;
}
}
{code}

This is a based on the condition that we can borrow some threads from RpcServer 
for a while.
In the actual code, I would use AtomicLong to manage the numbers of threads and 
tasks.



> Apply the Leader/Followers pattern to RpcServer's Reader
> 
>
> Key: HBASE-14479
> URL: https://issues.apache.org/jira/browse/HBASE-14479
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC, Performance
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Attachments: HBASE-14479-V2 (1).patch, HBASE-14479-V2.patch, 
> HBASE-14479-V2.patch, HBASE-14479.patch, flamegraph-19152.svg, 
> flamegraph-32667.svg, gc.png, gets.png, io.png, median.png
>
>
> {{RpcServer}} uses multiple selectors to read data for load distribution, but 
> the distribution is just done by round-robin. It is uncertain, especially for 
> long run, whether load is equally divided and resources are used without 
> being wasted.
> Moreover, multiple selectors may cause excessive context switches which give 
> priority to low latency (while we just add the requests to queues), and it is 
> possible to reduce throughput of the whole server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14511) StoreFile.Writer Meta Plugin

2015-10-27 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977553#comment-14977553
 ] 

Enis Soztutar commented on HBASE-14511:
---

This is very good. 
bq. I'd like to use this for Phoenix to store min/max for some column 
qualifiers in the HFile itself. At scan time we can then efficiently rule out 
entire HFiles based on those (similar to HBase does it with key ranges, and 
timestamps) - that would be a cheap local secondary index. James Taylor, FYI.
Yes, this should be one of the goals. 
bq. Can we make this accessible through coprocessor hooks somehow (I'd need to 
think about this side, though).
We can do the co-processor way or the Plugin way. Adding these methods and 
functionality to RegionObserver is definitely one option. 
If we are doing the plugin way as in the patch, we should allow plugins to be 
defined per table. 

If we are doing the Plugin-way (rather than coprocessor), can we please pass 
Context objects instead of passing the parameters directly (Configuration, 
Writer, etc). This becomes important since we do not want to break phoenix and 
other third party plugins if we decide to pass new parameters in the future. An 
example of this pattern is: 
{code}
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.REPLICATION)
public interface ReplicationEndpoint extends Service {
  @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.REPLICATION)
  class Context {
  ...
  }
  void init(Context context) throws IOException;

  @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.REPLICATION)
  static class ReplicateContext {
  ..
  }
  boolean replicate(ReplicateContext replicateContext);
}
{code}

Why {{MetaWriter}} and {{Plugin}} are different classes? We can just have a 
single Plugin class, no? 

It is a nit, but HBase convention has the opening curly bracket "{" in the same 
line for class and method definitions.  





> StoreFile.Writer Meta Plugin
> 
>
> Key: HBASE-14511
> URL: https://issues.apache.org/jira/browse/HBASE-14511
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-14511-v3.patch, HBASE-14511.v1.patch, 
> HBASE-14511.v2.patch
>
>
> During my work on a new compaction policies (HBASE-14468, HBASE-14477) I had 
> to modify the existing code of a StoreFile.Writer to add additional meta-info 
> required by these new  policies. I think that it should be done by means of a 
> new Plugin framework, because this seems to be a general capability/feature. 
> As a future enhancement this can become a part of a more general 
> StoreFileWriter/Reader plugin architecture. But I need only Meta section of a 
> store file.
> This could be used, for example, to collect rowkeys distribution information 
> during hfile creation. This info can be used later to find the optimal region 
> split key or to create optimal set of sub-regions for M/R jobs or other jobs 
> which can operate on a sub-region level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977543#comment-14977543
 ] 

Hudson commented on HBASE-14655:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1123 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1123/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
fc0aa1e7b2612f3045138e11bdef506638842bfb)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977542#comment-14977542
 ] 

Hudson commented on HBASE-14705:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1123 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1123/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev fd18723e37e358a5287575feab84da258f074337)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977541#comment-14977541
 ] 

Hudson commented on HBASE-14680:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1123 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1123/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 82464eacb837f1f74d3937da6e291a7add4c8c3a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14706) RegionLocationFinder should return multiple servername by top host

2015-10-27 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-14706:
---
Affects Version/s: 1.3.0
   1.2.0

> RegionLocationFinder should return multiple servername by top host
> --
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14706) RegionLocationFinder should return multiple servername by top host

2015-10-27 Thread Guanghao Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-14706:
---
Attachment: HBASE-14706.patch

> RegionLocationFinder should return multiple servername by top host
> --
>
> Key: HBASE-14706
> URL: https://issues.apache.org/jira/browse/HBASE-14706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-14706.patch
>
>
> Multiple RS can run on the same host. But in current RegionLocationFinder, 
> mapHostNameToServerName map one host to only one server. This will make 
> LocalityCostFunction get wrong locality about region.
> {code}
> // create a mapping from hostname to ServerName for fast lookup
> HashMap hostToServerName = new HashMap ServerName>();
> for (ServerName sn : regionServers) {
>   hostToServerName.put(sn.getHostname(), sn);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977520#comment-14977520
 ] 

Hudson commented on HBASE-14709:


FAILURE: Integrated in HBase-1.0 #1103 (See 
[https://builds.apache.org/job/HBase-1.0/1103/])
HBASE-14709 Parent change breaks graceful_stop.sh on a cluster (stack: rev 
fe7cd0dee5fd8eb4b3077a0f6c2457929646d61c)
* bin/graceful_stop.sh


> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977519#comment-14977519
 ] 

Hudson commented on HBASE-14674:


FAILURE: Integrated in HBase-1.0 #1103 (See 
[https://builds.apache.org/job/HBase-1.0/1103/])
HBASE-14674 Rpc handler / task monitoring seems to be broken after 0.98 (enis: 
rev ccded0bd6abb635613cd7801245ab15e8e090b0b)
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java


> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977518#comment-14977518
 ] 

Hudson commented on HBASE-14705:


FAILURE: Integrated in HBase-1.0 #1103 (See 
[https://builds.apache.org/job/HBase-1.0/1103/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev dd06ddec63f580ff59494d9c5993f40804a10a29)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977517#comment-14977517
 ] 

Hudson commented on HBASE-14680:


FAILURE: Integrated in HBase-1.0 #1103 (See 
[https://builds.apache.org/job/HBase-1.0/1103/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 25b4427ee8f36f78174141816804a8e7b2d76d9f)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977472#comment-14977472
 ] 

Hudson commented on HBASE-14655:


SUCCESS: Integrated in HBase-1.3-IT #276 (See 
[https://builds.apache.org/job/HBase-1.3-IT/276/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
f6a30d2331f45e0fb8e13398bd56d0b9f71ed6ea)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14696) Support setting allowPartialResults in mapreduce Mappers

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977471#comment-14977471
 ] 

Hudson commented on HBASE-14696:


SUCCESS: Integrated in HBase-1.3-IT #276 (See 
[https://builds.apache.org/job/HBase-1.3-IT/276/])
HBASE-14696 Support setting allowPartialResults in mapreduce Mappers (tedyu: 
rev 8fc9c2803f1a27cde6b6ee5906bb7289410e6e86)
* hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* hbase-protocol/src/main/protobuf/Client.proto


> Support setting allowPartialResults in mapreduce Mappers
> 
>
> Key: HBASE-14696
> URL: https://issues.apache.org/jira/browse/HBASE-14696
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Mindaugas Kairys
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 14696-branch-1-v1.txt, 14696-branch-1-v2.txt, 
> 14696-branch-1-v2.txt, 14696-v1.txt, 14696-v2.txt
>
>
> It is currently impossible to get partial results in mapreduce mapper jobs.
> When setting setAllowPartialResults(true) for scan jobs, they still fail with 
> OOME on large rows.
> The reason is that Scan field allowPartialResults is lost during job creation:
>   1. User creates a Job and sets a scan object via 
> TableMapReduceUtil.initTableMapperJob(table_name, scanObj,...) -> which puts 
> a result of TableMapReduceUtil.convertScanToString(scanObj) to the job config.
>   2. When the job starts - method TableInputFormat.setConfig retrieves a scan 
> string from config and converts it to Scan object by calling 
> TableMapReduceUtil.convertStringToScan - which results in a Scan object with 
> a field allowPartialResults always set to false.
> I have tried to experiment and modify a TableInputFormat method setConfig() 
> by forcing all scans to allow partial results and after this all jobs 
> succeeded with no more OOME and I also noticed that mappers began to get 
> partial results (Result.isPartial()).
> My use case is very simple - I just have large rows and expect a mapper to 
> get them partially - to get same rowid several times with different key/value 
> records.
> This would allow me not to worry about implementing my own result 
> partitioning solution, which i would encounter in case the big amount of 
> result key values could be transparently returned for a single large row.
> And from the other side - if a Scan object can return several records for the 
> same rowid (partial results), perhaps the mapper should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write TreeMap for region location cache

2015-10-27 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Attachment: HBASE-14708-v4.patch

Even more tests.

> Use copy on write TreeMap for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v2.patch, HBASE-14708-v3.patch, 
> HBASE-14708-v4.patch, HBASE-14708.patch, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977452#comment-14977452
 ] 

Hudson commented on HBASE-14680:


FAILURE: Integrated in HBase-TRUNK #6969 (See 
[https://builds.apache.org/job/HBase-TRUNK/6969/])
HBASE-14680 Two configs for snapshot timeout and better defaults (Heng (enis: 
rev 16ff57bea94645aae30ba9b6bf4375b2eec202f1)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/DisabledTableSnapshotHandler.java
* hbase-common/src/main/resources/hbase-default.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java


> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977453#comment-14977453
 ] 

Hudson commented on HBASE-14705:


FAILURE: Integrated in HBase-TRUNK #6969 (See 
[https://builds.apache.org/job/HBase-TRUNK/6969/])
HBASE-14705 Javadoc for KeyValue constructor is not correct (Jean-Marc 
(apurtell: rev dfa05284cfef985e806660de0a1415f0fa7c2211)
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14688) Cleanup MOB tests

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977455#comment-14977455
 ] 

Hudson commented on HBASE-14688:


FAILURE: Integrated in HBase-TRUNK #6969 (See 
[https://builds.apache.org/job/HBase-TRUNK/6969/])
HBASE-14688 Cleanup MOB tests (matteo.bertozzi: rev 
c91bfff5862fd38b3d301e3371d2f643d0c501ea)
* hbase-server/src/test/java/org/apache/hadoop/hbase/util/BaseTestHBaseFsck.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/mob/TestCachedMobFile.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMobStoreScanner.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/mob/TestMobFile.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mob/TestMobDataBlockEncoding.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mob/compactions/TestMobCompactor.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMobStoreCompaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mob/TestDefaultMobStoreFlusher.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/mob/MobTestUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mob/TestExpiredMobFileCleaner.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mob/compactions/TestPartitionedMobCompactor.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDeleteMobTable.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


> Cleanup MOB tests
> -
>
> Key: HBASE-14688
> URL: https://issues.apache.org/jira/browse/HBASE-14688
> Project: HBase
>  Issue Type: Test
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-14688-v0.patch, HBASE-14688-v1.patch
>
>
> remove the copy-paste stuff and use MobUtil instead of redoing concatenation 
> and extraction by hand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977454#comment-14977454
 ] 

Hudson commented on HBASE-14674:


FAILURE: Integrated in HBase-TRUNK #6969 (See 
[https://builds.apache.org/job/HBase-TRUNK/6969/])
HBASE-14674 Rpc handler / task monitoring seems to be broken after 0.98 (enis: 
rev d5d81d675ace2d87c4ac19562b6b0a29da3d8902)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java


> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14675) Exorcise deprecated Put#add(...) and replace with Put#addColumn(...)

2015-10-27 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-14675:
---
Attachment: hbase-14675.v3.patch

v3 updated against latest master and fixed line lengths.

> Exorcise deprecated Put#add(...) and replace with Put#addColumn(...)
> 
>
> Key: HBASE-14675
> URL: https://issues.apache.org/jira/browse/HBASE-14675
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 2.0.0
>
> Attachments: hbase-14675.patch, hbase-14675.patch, 
> hbase-14675.v2.patch, hbase-14675.v3.patch
>
>
> The Put API changed from #add(...) to #addColumn(...).  This updates all 
> instances of it and removes it from the Put (which was added for hbase 1.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977449#comment-14977449
 ] 

Hudson commented on HBASE-14655:


SUCCESS: Integrated in HBase-1.2-IT #246 (See 
[https://builds.apache.org/job/HBase-1.2-IT/246/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
10239fca617107a1a01301bb77bba9b98fb684a8)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977437#comment-14977437
 ] 

Hudson commented on HBASE-14655:


FAILURE: Integrated in HBase-0.98 #1169 (See 
[https://builds.apache.org/job/HBase-0.98/1169/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
fc0aa1e7b2612f3045138e11bdef506638842bfb)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14425) In Secure Zookeeper cluster superuser will not have sufficient permission if multiple values are configured in "hbase.superuser"

2015-10-27 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14425:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.3.0
   1.2.0
   Status: Resolved  (was: Patch Available)

I've pushed this to 1.2+. Thanks Pankaj. 

> In Secure Zookeeper cluster superuser will not have sufficient permission if 
> multiple values are configured in "hbase.superuser"
> 
>
> Key: HBASE-14425
> URL: https://issues.apache.org/jira/browse/HBASE-14425
> Project: HBase
>  Issue Type: Bug
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14425-V2.patch, HBASE-14425-V2.patch, 
> HBASE-14425.patch
>
>
> During master intialization we are setting ACLs for the znodes.
> In ZKUtil.createACL(ZooKeeperWatcher zkw, String node, boolean 
> isSecureZooKeeper),
> {code}
>   String superUser = zkw.getConfiguration().get("hbase.superuser");
>   ArrayList acls = new ArrayList();
>   // add permission to hbase supper user
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> Here we are directly setting "hbase.superuser" value to Znode which will 
> cause an issue when multiple values are configured. In "hbase.superuser" 
> multiple superusers and supergroups can be configured separated by comma. We 
> need to iterate them and set ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14699) Replication crashes regionservers when hbase.wal.provider is set to multiwal

2015-10-27 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977425#comment-14977425
 ] 

Ashu Pachauri commented on HBASE-14699:
---

Here is an extract from the logs (I added some extra logging to debug, you wont 
find it in the code): 
{code}
15/10/27 15:21:20 INFO wal.FSHLog: Rolled WAL 
//,16020,1445984296081/%2C16020%2C1445984296081.null2.1445984298092
 with entries=15, filesize=13.96 KB; new WAL 
//,16020,1445984296081/%2C16020%2C1445984296081.null2.1445984480906
15/10/27 15:21:21 INFO regionserver.ReplicationSourceManager: Given key: 
%2C16020%2C1445984296081.null2.1445984480906, Deleting log: 
%2C16020%2C1445984296081.null0.1445984481007
15/10/27 15:21:21 INFO zookeeper.RecoverableZooKeeper: Deleting znode: 
/hbase/replication/rs/,16020,1445984296081/1/%2C16020%2C1445984296081.null0.1445984481007,
 version: -1
{code}

ReplicationSourceManager#cleanOldLogs cleans up logs older than a given key 
(log name) by just sorting the names. It works for defaultwalprovider. However, 
for multiwal, it breaks because log0.newtimestamp (after rolling) would be  
deleted on the basis of being older than log2.oldtimestamp. The cleanup process 
should also take into account the timestamps of individual logs and not just 
the sorted order of their names.

> Replication crashes regionservers when hbase.wal.provider is set to multiwal
> 
>
> Key: HBASE-14699
> URL: https://issues.apache.org/jira/browse/HBASE-14699
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Blocker
>
> When the hbase.wal.provider is set to multiwal and replication is enabled, 
> the regionservers start crashing with the following exception:
> {code}
> ,16020,1445495411258: Failed to write replication wal position 
> (filename=%2C16020%2C1445495411258.null0.1445495898373, 
> position=1322399)
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/,16020,1445495411258/1/%2C16020%2C1445495411258.null0.1445495898373
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
>   at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:429)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:940)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:990)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:984)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.setLogPosition(ReplicationQueuesZKImpl.java:129)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:177)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:388)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14708) Use copy on write TreeMap for region location cache

2015-10-27 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977414#comment-14977414
 ] 

Matteo Bertozzi commented on HBASE-14708:
-

looks good to me, some time ago [~esteban] noticed the same SkipList 
bottleneck, I think even with a simple YCSB running.

> Use copy on write TreeMap for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v2.patch, HBASE-14708-v3.patch, 
> HBASE-14708.patch, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-14709.
---
   Resolution: Fixed
Fix Version/s: (was: 0.99.1)
   (was: 0.98.7)
   0.98.16
   1.1.3
   1.0.3
   1.3.0
   1.2.0

Pushed to branches 0.98+

> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write TreeMap for region location cache

2015-10-27 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Attachment: HBASE-14708-v3.patch

> Use copy on write TreeMap for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v2.patch, HBASE-14708-v3.patch, 
> HBASE-14708.patch, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14425) In Secure Zookeeper cluster superuser will not have sufficient permission if multiple values are configured in "hbase.superuser"

2015-10-27 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977395#comment-14977395
 ] 

Enis Soztutar commented on HBASE-14425:
---

+1. I'll commit shortly. 

> In Secure Zookeeper cluster superuser will not have sufficient permission if 
> multiple values are configured in "hbase.superuser"
> 
>
> Key: HBASE-14425
> URL: https://issues.apache.org/jira/browse/HBASE-14425
> Project: HBase
>  Issue Type: Bug
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-14425-V2.patch, HBASE-14425-V2.patch, 
> HBASE-14425.patch
>
>
> During master intialization we are setting ACLs for the znodes.
> In ZKUtil.createACL(ZooKeeperWatcher zkw, String node, boolean 
> isSecureZooKeeper),
> {code}
>   String superUser = zkw.getConfiguration().get("hbase.superuser");
>   ArrayList acls = new ArrayList();
>   // add permission to hbase supper user
>   if (superUser != null) {
> acls.add(new ACL(Perms.ALL, new Id("auth", superUser)));
>   }
> {code}
> Here we are directly setting "hbase.superuser" value to Znode which will 
> cause an issue when multiple values are configured. In "hbase.superuser" 
> multiple superusers and supergroups can be configured separated by comma. We 
> need to iterate them and set ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14709) Parent change breaks graceful_stop.sh on a cluster

2015-10-27 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14709:
--
Attachment: rr.patch

Small patch. Tested on cluster. Going to commit.

> Parent change breaks graceful_stop.sh on a cluster
> --
>
> Key: HBASE-14709
> URL: https://issues.apache.org/jira/browse/HBASE-14709
> Project: HBase
>  Issue Type: Sub-task
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 0.98.7, 0.99.1
>
> Attachments: rr.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients

2015-10-27 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977336#comment-14977336
 ] 

Gary Helmling edited comment on HBASE-14700 at 10/27/15 11:19 PM:
--

Patch against master to enable server-side fallback to simple auth:
* adds hbase.ipc.server.fallback-to-simple-auth-allowed (default=false) to 
enable clients sending auth method SIMPLE to continue through
* adds warning on startup when enabled
* adds metric for number of insecure fallbacks allowed
* adds test for insecure fallback
* makes RSRpcServices and RpcServer ConfigurationObservers so that the fallback 
property can be dynamically reconfigured


was (Author: ghelmling):
Patch against master to enable server-side fallback to simple auth:
* adds hbase.ipc.server.fallback-to-simple-auth-allowed (default=false) to 
enable clients sending auth method SIMPLE to continue through
* adds warning on startup when enabled
* adds metric for number of insecure fallbacks allowed
* adds test for insecure fallback

> Support a "permissive" mode for secure clusters to allow "simple" auth clients
> --
>
> Key: HBASE-14700
> URL: https://issues.apache.org/jira/browse/HBASE-14700
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 2.0.0
>
> Attachments: HBASE-14700.patch
>
>
> When implementing HBase security for an existing cluster, it can be useful to 
> support mixed secure and insecure clients while all client configurations are 
> migrated over to secure authentication.  
> We currently have an option to allow secure clients to fallback to simple 
> auth against insecure clusters.  By providing an analogous setting for 
> servers, we would allow a phased rollout of security:
> # First, security can be enabled on the cluster servers, with the 
> "permissive" mode enabled
> # Clients can be converting to using secure authentication incrementally
> # The server audit logs allow identification of clients still using simple 
> auth to connect
> # Finally, when sufficient clients have been converted to secure operation, 
> the server-side "permissive" mode can be removed, allowing completely secure 
> operation.
> Obviously with this enabled, there is no effective access control, but this 
> would still be a useful tool to enable a smooth operational rollout of 
> security.  Permissive mode would of course be disabled by default.  Enabling 
> it should provide a big scary warning in the logs on startup, and possibly be 
> flagged on relevant UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients

2015-10-27 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-14700:
--
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> Support a "permissive" mode for secure clusters to allow "simple" auth clients
> --
>
> Key: HBASE-14700
> URL: https://issues.apache.org/jira/browse/HBASE-14700
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 2.0.0
>
> Attachments: HBASE-14700.patch
>
>
> When implementing HBase security for an existing cluster, it can be useful to 
> support mixed secure and insecure clients while all client configurations are 
> migrated over to secure authentication.  
> We currently have an option to allow secure clients to fallback to simple 
> auth against insecure clusters.  By providing an analogous setting for 
> servers, we would allow a phased rollout of security:
> # First, security can be enabled on the cluster servers, with the 
> "permissive" mode enabled
> # Clients can be converting to using secure authentication incrementally
> # The server audit logs allow identification of clients still using simple 
> auth to connect
> # Finally, when sufficient clients have been converted to secure operation, 
> the server-side "permissive" mode can be removed, allowing completely secure 
> operation.
> Obviously with this enabled, there is no effective access control, but this 
> would still be a useful tool to enable a smooth operational rollout of 
> security.  Permissive mode would of course be disabled by default.  Enabling 
> it should provide a big scary warning in the logs on startup, and possibly be 
> flagged on relevant UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients

2015-10-27 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-14700:
--
Attachment: HBASE-14700.patch

Patch against master to enable server-side fallback to simple auth:
* adds hbase.ipc.server.fallback-to-simple-auth-allowed (default=false) to 
enable clients sending auth method SIMPLE to continue through
* adds warning on startup when enabled
* adds metric for number of insecure fallbacks allowed
* adds test for insecure fallback

> Support a "permissive" mode for secure clusters to allow "simple" auth clients
> --
>
> Key: HBASE-14700
> URL: https://issues.apache.org/jira/browse/HBASE-14700
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Attachments: HBASE-14700.patch
>
>
> When implementing HBase security for an existing cluster, it can be useful to 
> support mixed secure and insecure clients while all client configurations are 
> migrated over to secure authentication.  
> We currently have an option to allow secure clients to fallback to simple 
> auth against insecure clusters.  By providing an analogous setting for 
> servers, we would allow a phased rollout of security:
> # First, security can be enabled on the cluster servers, with the 
> "permissive" mode enabled
> # Clients can be converting to using secure authentication incrementally
> # The server audit logs allow identification of clients still using simple 
> auth to connect
> # Finally, when sufficient clients have been converted to secure operation, 
> the server-side "permissive" mode can be removed, allowing completely secure 
> operation.
> Obviously with this enabled, there is no effective access control, but this 
> would still be a useful tool to enable a smooth operational rollout of 
> security.  Permissive mode would of course be disabled by default.  Enabling 
> it should provide a big scary warning in the logs on startup, and possibly be 
> flagged on relevant UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write TreeMap for region location cache

2015-10-27 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Attachment: HBASE-14708-v2.patch

> Use copy on write TreeMap for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v2.patch, HBASE-14708.patch, 
> location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977330#comment-14977330
 ] 

Hudson commented on HBASE-14655:


FAILURE: Integrated in HBase-1.2 #313 (See 
[https://builds.apache.org/job/HBase-1.2/313/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
10239fca617107a1a01301bb77bba9b98fb684a8)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13014) Java Tool For Region Moving

2015-10-27 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13014:
---
   Resolution: Fixed
Fix Version/s: (was: 0.98.16)
   (was: 1.3.0)
   Status: Resolved  (was: Patch Available)

The v2 patch for master applies and tests pass reliably - on master.

I can also apply the v2 patch for master on branch-1 but the tests fail. As it 
stands this patch can't go further back than master. 

Committed to master.

[~abhishek.chouhan] if you'd like to see this applied to other branches like 
branch-1 and 0.98, we'd gladly accept additional patches for the respective 
branches that pass tests.

> Java Tool For Region Moving 
> 
>
> Key: HBASE-13014
> URL: https://issues.apache.org/jira/browse/HBASE-13014
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0
>
> Attachments: HBASE-13014-master-v2.patch, HBASE-13014-master.patch, 
> HBASE-13014-v2.patch, HBASE-13014-v3.patch, HBASE-13014-v4.patch, 
> HBASE-13014-v5.patch, HBASE-13014-v6.patch, HBASE-13014.patch
>
>
> As per discussion on HBASE-12989 we should move the functionality of 
> region_mover.rb into a Java tool and use region_mover.rb only only as a 
> wrapper around it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients

2015-10-27 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977281#comment-14977281
 ] 

Gary Helmling commented on HBASE-14700:
---

Hi [~appy], yes, as you note, there are some problems with trying to use the 
client-side fallback configuration to achieve the same goal:
# Until security is actually configured on the server-side, you are not 
actually testing the client-side secure configuration.  This allows for 
configuration errors which are not discovered until the server-side security is 
enabled.
# Enabling server-side security would require a complete cluster shutdown, 
which may be undesirable.

I believe that allowing the fallback to be configured on and off on the 
server-side allows a more incremental approach to rollout, especially in cases 
where many clients may be using a cluster, each requiring their own changes and 
testing.

I think in theory this could even allow enabling secure configuration on the 
server-side on a rolling basis.

I'll post a patch for review shortly.

> Support a "permissive" mode for secure clusters to allow "simple" auth clients
> --
>
> Key: HBASE-14700
> URL: https://issues.apache.org/jira/browse/HBASE-14700
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>
> When implementing HBase security for an existing cluster, it can be useful to 
> support mixed secure and insecure clients while all client configurations are 
> migrated over to secure authentication.  
> We currently have an option to allow secure clients to fallback to simple 
> auth against insecure clusters.  By providing an analogous setting for 
> servers, we would allow a phased rollout of security:
> # First, security can be enabled on the cluster servers, with the 
> "permissive" mode enabled
> # Clients can be converting to using secure authentication incrementally
> # The server audit logs allow identification of clients still using simple 
> auth to connect
> # Finally, when sufficient clients have been converted to secure operation, 
> the server-side "permissive" mode can be removed, allowing completely secure 
> operation.
> Obviously with this enabled, there is no effective access control, but this 
> would still be a useful tool to enable a smooth operational rollout of 
> security.  Permissive mode would of course be disabled by default.  Enabling 
> it should provide a big scary warning in the logs on startup, and possibly be 
> flagged on relevant UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-14699) Replication crashes regionservers when hbase.wal.provider is set to multiwal

2015-10-27 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri reassigned HBASE-14699:
-

Assignee: Ashu Pachauri

> Replication crashes regionservers when hbase.wal.provider is set to multiwal
> 
>
> Key: HBASE-14699
> URL: https://issues.apache.org/jira/browse/HBASE-14699
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Blocker
>
> When the hbase.wal.provider is set to multiwal and replication is enabled, 
> the regionservers start crashing with the following exception:
> {code}
> ,16020,1445495411258: Failed to write replication wal position 
> (filename=%2C16020%2C1445495411258.null0.1445495898373, 
> position=1322399)
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/,16020,1445495411258/1/%2C16020%2C1445495411258.null0.1445495898373
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
>   at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:429)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:940)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:990)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:984)
>   at 
> org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.setLogPosition(ReplicationQueuesZKImpl.java:129)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:177)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:388)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14708) Use copy on write TreeMap for region location cache

2015-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977275#comment-14977275
 ] 

Hadoop QA commented on HBASE-14708:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769079/HBASE-14708.patch
  against master branch at commit d5d81d675ace2d87c4ac19562b6b0a29da3d8902.
  ATTACHMENT ID: 12769079

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] COMPILATION ERROR : 
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[241,27]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[249,33]
 no suitable method found for remove(java.lang.Object,java.lang.Object)
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[257,33]
 cannot find symbol
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[265,27]
 cannot find symbol
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on 
project hbase-common: Compilation failure: Compilation failure:
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[241,27]
 cannot find symbol
[ERROR] symbol:   method putIfAbsent(K,V)
[ERROR] location: variable newMap of type java.util.TreeMap
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[249,33]
 no suitable method found for remove(java.lang.Object,java.lang.Object)
[ERROR] method java.util.TreeMap.remove(java.lang.Object) is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] method java.util.AbstractMap.remove(java.lang.Object) is not applicable
[ERROR] (actual and formal argument lists differ in length)
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[257,33]
 cannot find symbol
[ERROR] symbol:   method replace(K,V,V)
[ERROR] location: variable newMap of type java.util.TreeMap
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-common/src/main/java/org/apache/hadoop/hbase/types/CopyOnWriteTreeMap.java:[265,27]
 cannot find symbol
[ERROR] symbol:   method replace(K,V)
[ERROR] location: variable newMap of type java.util.TreeMap
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-common


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16250//console

This message is automatically generated.

> Use copy on write TreeMap for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708.patch, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to si

[jira] [Updated] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients

2015-10-27 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-14700:
--
Description: 
When implementing HBase security for an existing cluster, it can be useful to 
support mixed secure and insecure clients while all client configurations are 
migrated over to secure authentication.  

We currently have an option to allow secure clients to fallback to simple auth 
against insecure clusters.  By providing an analogous setting for servers, we 
would allow a phased rollout of security:
# First, security can be enabled on the cluster servers, with the "permissive" 
mode enabled
# Clients can be converting to using secure authentication incrementally
# The server audit logs allow identification of clients still using simple auth 
to connect
# Finally, when sufficient clients have been converted to secure operation, the 
server-side "permissive" mode can be removed, allowing completely secure 
operation.

Obviously with this enabled, there is no effective access control, but this 
would still be a useful tool to enable a smooth operational rollout of 
security.  Permissive mode would of course be disabled by default.  Enabling it 
should provide a big scary warning in the logs on startup, and possibly be 
flagged on relevant UIs.

  was:
When implementing HBase security for an existing cluster, it can be useful to 
support mixed secure and insecure clients while all client configurations are 
migrated over to secure authentication.  

We currently have an option to allow secure clients to fallback to simple auth 
against insecure clusters.  By providing an analogous setting for servers, we 
would allow a phased rollout of security:
#. First, security can be enabled on the cluster servers, with the "permissive" 
mode enabled
#. Clients can be converting to using secure authentication incrementally
#. The server audit logs allow identification of clients still using simple 
auth to connect
#. Finally, when sufficient clients have been converted to secure operation, 
the server-side "permissive" mode can be removed, allowing completely secure 
operation.

Obviously with this enabled, there is no effective access control, but this 
would still be a useful tool to enable a smooth operational rollout of 
security.  Permissive mode would of course be disabled by default.  Enabling it 
should provide a big scary warning in the logs on startup, and possibly be 
flagged on relevant UIs.


> Support a "permissive" mode for secure clusters to allow "simple" auth clients
> --
>
> Key: HBASE-14700
> URL: https://issues.apache.org/jira/browse/HBASE-14700
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>
> When implementing HBase security for an existing cluster, it can be useful to 
> support mixed secure and insecure clients while all client configurations are 
> migrated over to secure authentication.  
> We currently have an option to allow secure clients to fallback to simple 
> auth against insecure clusters.  By providing an analogous setting for 
> servers, we would allow a phased rollout of security:
> # First, security can be enabled on the cluster servers, with the 
> "permissive" mode enabled
> # Clients can be converting to using secure authentication incrementally
> # The server audit logs allow identification of clients still using simple 
> auth to connect
> # Finally, when sufficient clients have been converted to secure operation, 
> the server-side "permissive" mode can be removed, allowing completely secure 
> operation.
> Obviously with this enabled, there is no effective access control, but this 
> would still be a useful tool to enable a smooth operational rollout of 
> security.  Permissive mode would of course be disabled by default.  Enabling 
> it should provide a big scary warning in the logs on startup, and possibly be 
> flagged on relevant UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14674) Rpc handler / task monitoring seems to be broken after 0.98

2015-10-27 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14674:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've pushed this to 0.98+. Thanks Heng. 

> Rpc handler / task monitoring seems to be broken after 0.98
> ---
>
> Key: HBASE-14674
> URL: https://issues.apache.org/jira/browse/HBASE-14674
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 1.3.0, 1.2.1, 1.0.3, 1.1.4, 0.98.17
>
> Attachments: HBASE-14674.patch, HBASE-14674_v1.patch, 
> HBASE-14674_v2.patch
>
>
> In 0.96, we have the RPC handlers listed as tasks and show them in the web UI 
> as well: 
> {code}
> Tasks:
> ===
> Task: RpcServer.handler=0,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=1,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> Task: RpcServer.handler=2,port=64231
> Status: WAITING:Waiting for a call
> Running for 932s
> {code}
> After pluggable RPC scheduler, the way the tasks work for the handlers got 
> changed. We no longer list idle RPC handlers in the tasks, but we register 
> them dynamically to {{TaskMonitor}} through {{CallRunner}}. However, the IPC 
> readers are still registered the old way (meaning that idle readers are 
> listed as tasks, but not idle handlers). 
> From the javadoc of {{MonitoredRPCHandlerImpl}}, it seems that we are NOT 
> optimizing the allocation for the MonitoredTask anymore, but instead allocate 
> one for every RPC call breaking the pattern (See CallRunner.getStatus()). 
> {code}
> /**
>  * A MonitoredTask implementation designed for use with RPC Handlers 
>  * handling frequent, short duration tasks. String concatenations and object 
>  * allocations are avoided in methods that will be hit by every RPC call.
>  */
> @InterfaceAudience.Private
> public class MonitoredRPCHandlerImpl extends MonitoredTaskImpl
> {code}
> There is also one more side affect that, since the CallRunner is a per-RPC 
> object and created in the RPC listener thread, the created task ends up 
> having a name "listener" although the actual processing happens in a handler 
> thread. This is obviously very confusing during debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977243#comment-14977243
 ] 

Hudson commented on HBASE-14655:


FAILURE: Integrated in HBase-1.1 #726 (See 
[https://builds.apache.org/job/HBase-1.1/726/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
b0ef01435169c3d3650d60e9ef1c7064fa1bff18)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14705:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.98.16
   1.1.3
   1.0.3
   1.3.0
   1.2.0
   2.0.0
   Status: Resolved  (was: Patch Available)

> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14705) Javadoc for KeyValue constructor is not correct.

2015-10-27 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977213#comment-14977213
 ] 

Andrew Purtell commented on HBASE-14705:


+1, committing


> Javadoc for KeyValue constructor is not correct.
> 
>
> Key: HBASE-14705
> URL: https://issues.apache.org/jira/browse/HBASE-14705
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
> Attachments: HBASE-14705-trunk.patch
>
>
> {code}
>   /**
>* Constructs KeyValue structure filled with null value.
>* @param row - row key (arbitrary byte array)
>* @param family family name
>* @param qualifier column qualifier
>*/
>   public KeyValue(final byte [] row, final byte [] family,
>   final byte [] qualifier, final byte [] value) {
> this(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Put, 
> value);
>   }
> {code}
> Value is not filled with null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

2015-10-27 Thread Eshcar Hillel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eshcar Hillel updated HBASE-13408:
--
Attachment: HBASE-13408-trunk-v08.patch

> HBase In-Memory Memstore Compaction
> ---
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
>  Issue Type: New Feature
>Reporter: Eshcar Hillel
>Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, 
> HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, 
> HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, 
> HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, 
> HBASE-13408-trunk-v08.patch, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, 
> HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, 
> InMemoryMemstoreCompactionEvaluationResults.pdf, 
> InMemoryMemstoreCompactionMasterEvaluationResults.pdf, 
> InMemoryMemstoreCompactionScansEvaluationResults.pdf, 
> StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its 
> in-memory component. The memstore absorbs all updates to the store; from time 
> to time these updates are flushed to a file on disk, where they are 
> compacted. Unlike disk components, the memstore is not compacted until it is 
> written to the filesystem and optionally to block-cache. This may result in 
> underutilization of the memory due to duplicate entries per row, for example, 
> when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are 
> triggered, the data sinks to disk more frequently, slowing down retrieval of 
> data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data 
> in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.The data is kept in memory for as long as possible
> 2.Memstore data is either compacted or in process of being compacted 
> 3.Allow a panic mode, which may interrupt an in-progress compaction and 
> force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14655) Narrow the scope of doAs() calls to region observer notifications for compaction

2015-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977180#comment-14977180
 ] 

Hudson commented on HBASE-14655:


FAILURE: Integrated in HBase-1.0 #1102 (See 
[https://builds.apache.org/job/HBase-1.0/1102/])
HBASE-14655 Addendum passes User to store#compact() (tedyu: rev 
10d86f21227fadc94b331640edaad7d67d7f8b97)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


> Narrow the scope of doAs() calls to region observer notifications for 
> compaction
> 
>
> Key: HBASE-14655
> URL: https://issues.apache.org/jira/browse/HBASE-14655
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: 14655-0.98-v9.txt, 14655-0.98-v9.txt, 
> 14655-addendum.txt, 14655-branch-1-v5.txt, 14655-branch-1-v6.txt, 
> 14655-branch-1-v7.txt, 14655-branch-1-v8.txt, 14655-branch-1-v9.txt, 
> 14655-branch-1.0-v10.txt, 14655-branch-1.0-v6.txt, 14655-branch-1.0-v7.txt, 
> 14655-branch-1.0-v8.txt, 14655-branch-1.0-v9.txt, 14655-v1.txt, 14655-v2.txt, 
> 14655-v3.txt, 14655-v4.txt, 14655-v5.txt, 14655-v6.txt, 14655-v7.txt, 
> 14655-v8.txt, 14655-v9.txt
>
>
> As what has been done in HBASE-14631 and HBASE-14605, the scope of calling 
> doAs() for compaction related region observer notifications should be 
> narrowed.
> User object is passed from CompactSplitThread down to the methods where 
> region observer notifications are made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14680) Two configs for snapshot timeout and better defaults

2015-10-27 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14680:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've pushed to 0.98+. Thanks Heng. 

> Two configs for snapshot timeout and better defaults
> 
>
> Key: HBASE-14680
> URL: https://issues.apache.org/jira/browse/HBASE-14680
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-14680.patch, HBASE-14680_v1.patch, 
> HBASE-14680_v2.patch, hbase-14680_v3.patch
>
>
> One of the clusters timed out taking a snapshot for a disabled table. The 
> table is big enough, and the master operation takes more than 1 min to 
> complete. However while trying to increase the timeout, we noticed that there 
> are two parameters with very similar names configuring different things: 
> {{hbase.snapshot.master.timeout.millis}} is defined in 
> SnapshotDescriptionUtils and is send to client side and used in disabled 
> table snapshot. 
> {{hbase.snapshot.master.timeoutMillis}} is defined in SnapshotManager and 
> used as the timeout for the procedure execution. 
> So, there are a couple of improvements that we can do: 
>  - 1 min is too low for big tables. We need to set this to 5 min or 10 min by 
> default. Even a 6T table which is medium sized fails. 
>  - Unify the two timeouts into one. Decide on either of them, and deprecate 
> the other. Use the biggest one for BC. 
>  - Add the timeout to hbase-default.xml. 
>  - Why do we even have a timeout for disabled table snapshots? The master is 
> doing the work so we should not timeout in any case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >