from:"\"Guanghao Zhang\""

[jira] [Created] (HBASE-19178) table.rb use undefined method 'getType' for Cell interface

2017-11-03 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19178:
--

 Summary: table.rb use undefined method 'getType' for Cell interface
 Key: HBASE-19178
 URL: https://issues.apache.org/jira/browse/HBASE-19178
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19186) Unify to use bytes to show size in master/rs ui

2017-11-04 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19186:
--

 Summary: Unify to use bytes to show size in master/rs ui
 Key: HBASE-19186
 URL: https://issues.apache.org/jira/browse/HBASE-19186
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


1. 10K ==> 10KB or 10M ==> 10MB or 10G => 10GB
2. remove "in bytes" in description



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19255) PerformanceEvaluation class not found when run PE test

2017-11-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19255:
--

 Summary: PerformanceEvaluation class not found when run PE test
 Key: HBASE-19255
 URL: https://issues.apache.org/jira/browse/HBASE-19255
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


{code}
mvn clean package install -DskipTests
./hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=1 --nomapred 
randomWrite 1
{code}

PerformanceEvaluation is in hbase-mapreduce module's test jar. After 
HBASE-18640, we move mapreduce out of hbase-server into separate 
hbase-mapreduce module. But didn't add the hbase-mapreduce test jar to 
hbase-assembly pom.xml. So it didn't add to the default classpath. Then the 
PerformanceEvaluation can't found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HBASE-19009) implement modifyTable and enable/disableTableReplication for AsyncAdmin

2017-11-15 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-19009:


> implement modifyTable and enable/disableTableReplication for AsyncAdmin
> ---
>
> Key: HBASE-19009
> URL: https://issues.apache.org/jira/browse/HBASE-19009
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Guanghao Zhang
>        Assignee: Guanghao Zhang
> Fix For: 3.0.0, 2.0.0-beta-1
>
> Attachments: HBASE-19009.master.001.patch, 
> HBASE-19009.master.002.patch, HBASE-19009.master.003.patch, 
> HBASE-19009.master.004.patch, HBASE-19009.master.005.patch, 
> HBASE-19009.master.006.patch, HBASE-19009.master.007.patch, 
> HBASE-19009.master.008.patch, HBASE-19009.master.009.patch, 
> HBASE-19009.master.010.patch, HBASE-19009.master.011.patch, 
> HBASE-19009.master.012.patch, HBASE-19009.master.addendum.patch
>
>
> Add 3 methods to AsyncAdmin.
> modifyTable()
> enableTableReplication()
> disableTableReplication()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-18912) Update Admin methods to return Lists instead of arrays

2017-11-16 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-18912.

Resolution: Won't Fix

As we need deprecate too many old methods. So if we can't find a better method 
name or don't have a good reason to deprecate the old methods, I thought we 
don't need to only change the return type from array to List... Resolve this as 
won't fix. Thanks

> Update Admin methods to return Lists instead of arrays
> --
>
> Key: HBASE-18912
> URL: https://issues.apache.org/jira/browse/HBASE-18912
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 2.0.0-beta-1
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-18805) Unify Admin and AsyncAdmin

2017-11-16 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-18805.

Resolution: Fixed

All sub-tasks done.

> Unify Admin and AsyncAdmin
> --
>
> Key: HBASE-18805
> URL: https://issues.apache.org/jira/browse/HBASE-18805
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Balazs Meszaros
>        Assignee: Guanghao Zhang
> Fix For: 2.0.0-beta-1
>
>
> Admin and AsyncAdmin differ some places:
> - some methods missing from AsyncAdmin (e.g. methods with String regex),
> - some methods have different names (listTables vs listTableDescriptors),
> - some method parameters are different (e.g. AsyncAdmin has Optional<> 
> parameters),
> - AsyncAdmin returns Lists instead of arrays (e.g. listTableNames),
> - unify Javadoc comments,
> - ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19293) Support add a disabled state replication peer directly

2017-11-17 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19293:
--

 Summary: Support add a disabled state replication peer directly
 Key: HBASE-19293
 URL: https://issues.apache.org/jira/browse/HBASE-19293
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


Now when add a replication peer, the default state is enabled. If you want add 
a disabled replication peer, you need add a peer first, then disable it. It 
need two step to finish now.

Use case for add a disabled replication peer. When user want sync data from a 
cluster A to a new peer cluster.
1. Add a disabled replication peer. And config the table to peer config.
2. Take a snapshot of table and export snapshot to peer cluster.
3. Restore snapshot in peer cluster.
4. Enable the peer and wait all stuck replication log replicated to peer 
cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-11386) Replication#table,CF config will be wrong if the table name includes namespace

2017-11-17 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-11386.

Resolution: Duplicate

Resolved by HBASE-11393 and HBASE-16653.

> Replication#table,CF config will be wrong if the table name includes namespace
> --
>
> Key: HBASE-11386
> URL: https://issues.apache.org/jira/browse/HBASE-11386
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Qianxi Zhang
>Assignee: Ashish Singhi
>Priority: Critical
> Fix For: 1.5.0
>
> Attachments: HBASE_11386_trunk_v1.patch, HBASE_11386_trunk_v2.patch
>
>
> Now we can config the table and CF in Replication, but I think the parse will 
> be wrong if the table name includes namespace
> ReplicationPeer#parseTableCFsFromConfig(line 125)
> {code}
> Map> tableCFsMap = null;
> // parse out (table, cf-list) pairs from tableCFsConfig
> // format: "table1:cf1,cf2;table2:cfA,cfB"
> String[] tables = tableCFsConfig.split(";");
> for (String tab : tables) {
>   // 1 ignore empty table config
>   tab = tab.trim();
>   if (tab.length() == 0) {
> continue;
>   }
>   // 2 split to "table" and "cf1,cf2"
>   //   for each table: "table:cf1,cf2" or "table"
>   String[] pair = tab.split(":");
>   String tabName = pair[0].trim();
>   if (pair.length > 2 || tabName.length() == 0) {
> LOG.error("ignore invalid tableCFs setting: " + tab);
> continue;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19303) Cleanup the usage of deprecated ReplicationAdmin

2017-11-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19303:
--

 Summary: Cleanup the usage of deprecated ReplicationAdmin
 Key: HBASE-19303
 URL: https://issues.apache.org/jira/browse/HBASE-19303
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection

2017-11-22 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19334:
--

 Summary: User.runAsLoginUser not work in AccessController because 
it use a short circuited connection
 Key: HBASE-19334
 URL: https://issues.apache.org/jira/browse/HBASE-19334
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19337) AsyncMetaTableAccessor may hang when call ScanController.terminate many times

2017-11-23 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19337:
--

 Summary: AsyncMetaTableAccessor may hang when call 
ScanController.terminate many times
 Key: HBASE-19337
 URL: https://issues.apache.org/jira/browse/HBASE-19337
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Code in ScanControllerImpl.
{code}
private void preCheck() {
  Preconditions.checkState(Thread.currentThread() == callerThread,
"The current thread is %s, expected thread is %s, " +
"you should not call this method outside onNext or onHeartbeat",
Thread.currentThread(), callerThread);
  Preconditions.checkState(state.equals(ScanControllerState.INITIALIZED),
"Invalid Stopper state %s", state);
}

@Override
public void terminate() {
  preCheck();
  state = ScanControllerState.TERMINATED;
}
{code}
So if call terminate on a already terminated scan, it will throw 
IllegalStateException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19349) Introduce wrong version depencency of servlet-api jar

2017-11-26 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19349:
--

 Summary: Introduce wrong version depencency of servlet-api jar
 Key: HBASE-19349
 URL: https://issues.apache.org/jira/browse/HBASE-19349
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-beta-1
Reporter: Guanghao Zhang
 Fix For: 3.0.0, 2.0.0-beta-1


Build a tarball.
{code}
mvn -DskipTests clean install && mvn -DskipTests package assembly:single
tar zxvf hbase-2.0.0-beta-1-SNAPSHOT-bin.tar.gz
{code}
Then I found there is a servlet-api-2.5.jar in the lib directory.

Start a distributed cluster with this tarball. And got exception when access 
Master/RS info jsp.
{code}
2017-11-27,10:02:05,066 WARN org.eclipse.jetty.server.HttpChannel: /
java.lang.NoSuchMethodError: 
javax.servlet.http.HttpServletRequest.isAsyncSupported()Z
at 
org.eclipse.jetty.server.ResourceService.sendData(ResourceService.java:689)
at 
org.eclipse.jetty.server.ResourceService.doGet(ResourceService.java:294)
at 
org.eclipse.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:458)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
at 
org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:113)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at 
org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at 
org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1374)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at 
org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at 
org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
{code}

Try mvn depencency:tree but didn't find why servlet-api-2.5.jar was introduced.

I download hbase-2.0.0-alpha4-bin.tar.gz and didn't find servlet-api-2.5.jar. 
And build a tar from hbase-2.0.0-alpha4-src.tar.gz and didn't find 
servlet-api-2.5.jar, too. So this may be introduced by recently commits. And 
should fix this when release 2.0.0-beta1.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19359) Revisit the default config of hbase client retries number

2017-11-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19359:
--

 Summary: Revisit the default config of hbase client retries number
 Key: HBASE-19359
 URL: https://issues.apache.org/jira/browse/HBASE-19359
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


This should be sub-task of HBASE-19148. As the retries number effect too many 
unit tests. So I open this issue to see the Hadoop QA result.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19395) [branch-1] TestEndToEndSplitTransaction.testMasterOpsWhileSplitting fails with NPE

2017-11-30 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19395:
--

 Summary: [branch-1] 
TestEndToEndSplitTransaction.testMasterOpsWhileSplitting fails with NPE
 Key: HBASE-19395
 URL: https://issues.apache.org/jira/browse/HBASE-19395
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Guanghao Zhang


[INFO] Running org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
[ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 50.388 
s <<< FAILURE! - in 
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
[ERROR] 
testMasterOpsWhileSplitting(org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction)
  Time elapsed: 8.903 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.test(TestEndToEndSplitTransaction.java:239)
at 
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.testMasterOpsWhileSplitting(TestEndToEndSplitTransaction.java:148)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19396) Fix flaky test TestHTableMultiplexerFlushCache

2017-11-30 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19396:
--

 Summary: Fix flaky test TestHTableMultiplexerFlushCache
 Key: HBASE-19396
 URL: https://issues.apache.org/jira/browse/HBASE-19396
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


[INFO] Running org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 36.67 s 
<<< FAILURE! - in org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache
[ERROR] 
testOnRegionMove(org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache)
  Time elapsed: 4.644 s  <<< FAILURE!
java.lang.AssertionError: Did not find a new RegionServer to use
at 
org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache.testOnRegionMove(TestHTableMultiplexerFlushCache.java:160)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HBASE-19239) Fix findbugs and error-prone warnings (branch-1)

2017-12-02 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-19239:


> Fix findbugs and error-prone warnings (branch-1)
> 
>
> Key: HBASE-19239
> URL: https://issues.apache.org/jira/browse/HBASE-19239
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 1.4.0
>
> Attachments: HBASE-19239-branch-1.patch, HBASE-19239-branch-1.patch, 
> HBASE-19239-branch-1.patch, HBASE-19239.branch-1.addendum.patch
>
>
> Fix important findbugs and error-prone warnings on branch-1.4 / branch-1. 
> Forward port as appropriate. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19470) Compaction state in Table web UI is not right when table is disabled

2017-12-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19470:
--

 Summary: Compaction state in Table web UI is not right when table 
is disabled
 Key: HBASE-19470
 URL: https://issues.apache.org/jira/browse/HBASE-19470
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Guanghao Zhang
Priority: Trivial


Table Attributes
Attribute Name  Value   Description
Enabled false   Is the table enabled
Compaction  sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)java.lang.reflect.Constructor.newInstance(Constructor.java:423)org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:95)org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:85)org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:371)org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:330)org.apache.hadoop.hbase.client.HBaseAdmin.getCompactionState(HBaseAdmin.java:3455)org.apache.hadoop.hbase.generated.master.table_jsp._jspService(table_jsp.java:283)org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)javax.servlet.http.HttpServlet.service(HttpServlet.java:820)org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:113)org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48)org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1432)org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)org.mortbay.jetty.Server.handle(Server.java:326)org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Unknown 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19492) Add EXCLUDE_NAMESPACE and EXCLUDE_TABLECFS support to replication peer config

2017-12-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19492:
--

 Summary: Add EXCLUDE_NAMESPACE and EXCLUDE_TABLECFS support to 
replication peer config
 Key: HBASE-19492
 URL: https://issues.apache.org/jira/browse/HBASE-19492
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


This is a follow-up issue after HBASE-16868. Copied the comments in HBASE-16868.

This replicate_all flag is useful to avoid misuse of replication peer config. 
And on our cluster we have more config: EXCLUDE_NAMESPACE and EXCLUDE_TABLECFS 
for replication peer. Let me tell more about our use case. We have two online 
serve cluster and one offline cluster for MR/Spark job. For online cluster, all 
tables will replicate to each other. And not all tables will replicate to 
offline cluster, because not all tables need OLAP job. We have hundreds of 
tables and if only one table don't need replicate to offline cluster, then you 
will config a lot of tables in replication peer config. So we add a new config 
option is EXCLUDE_TABLECFS. Then you only need config one table (which don't 
need replicate) in EXCLUDE_TABLECFS.

Then when the replicate_all flag is false, you can config NAMESPACE or TABLECFS 
means which namespace/tables need replicate to peer cluster. When replicate_all 
flag is true, you can config EXCLUDE_NAMESPACE or EXCLUDE_TABLECFS means which 
namespace/tables can't replicate to peer cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19495) Fix failed ut TestShell

2017-12-12 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19495:
--

 Summary: Fix failed ut TestShell
 Key: HBASE-19495
 URL: https://issues.apache.org/jira/browse/HBASE-19495
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Failed on master branch. Need debug.

[INFO] Running org.apache.hadoop.hbase.client.TestShell
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 722.737 
s <<< FAILURE! - in org.apache.hadoop.hbase.client.TestShell
[ERROR] testRunShellTests(org.apache.hadoop.hbase.client.TestShell)  Time 
elapsed: 699.473 s  <<< ERROR!
org.jruby.embed.EvalFailedException: (RuntimeError) Shell unit tests failed. 
Check output file for details.
at 
org.apache.hadoop.hbase.client.TestShell.testRunShellTests(TestShell.java:36)
Caused by: org.jruby.exceptions.RaiseException: (RuntimeError) Shell unit tests 
failed. Check output file for details.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19522) The complete order is wrong in AsyncBufferedMutatorImpl

2017-12-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19522:
--

 Summary: The complete order is wrong in AsyncBufferedMutatorImpl
 Key: HBASE-19522
 URL: https://issues.apache.org/jira/browse/HBASE-19522
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


{code}
List> toComplete = this.futures;
assert toSend.size() == toComplete.size();
this.mutations = new ArrayList<>();
this.futures = new ArrayList<>();
bufferedSize = 0L; 
Iterator> toCompleteIter = toComplete.iterator();
for (CompletableFuture future : table.batch(toSend)) {
  future.whenComplete((r, e) -> {
CompletableFuture f = toCompleteIter.next(); // Call next in 
callback, so the complete order may different with the future order
if (e != null) {
  f.completeExceptionally(e);
} else {
  f.complete(null);
}
  }); 
}
{code}

Here we call table.batch to get a list of CompleteFuture for each mutation. 
Then we register a call back for each future. But the problem is we call 
toCompleteIter.next() in the callback. So we may complete the future by a wrong 
order(not same with the mutation order). Meanwhile, as ArrayList is not thread 
safe, so different thread may get same future by toCompleteIter.next().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-18429) ITs attempt to modify immutable table/column descriptors

2017-12-19 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-18429.

Resolution: Fixed
  Assignee: Mike Drob

Resolve this as all sub-tasks done.

> ITs attempt to modify immutable table/column descriptors
> 
>
> Key: HBASE-18429
> URL: https://issues.apache.org/jira/browse/HBASE-18429
> Project: HBase
>  Issue Type: Umbrella
>  Components: integration tests
>Affects Versions: 2.0.0-alpha-1
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Critical
> Fix For: 2.0.0-beta-1
>
>
> ITs:
> * IntegrationTestIngestWithMOB (HBASE-18419)
> * IntegrationTestDDLMasterFailover (HBASE-18428)
> * IntegrationTestIngestWithEncryption::setUp (HBASE-18440)
> * IntegrationTestBulkLoad::installSlowingCoproc (HBASE-18440)
> Other Related:
> * ChangeBloomFilterAction (HBASE-18419)
> * ChangeCompressionAction (HBASE-18419)
> * ChangeEncodingAction (HBASE-18419)
> * ChangeVersionsAction (HBASE-18419)
> * RemoveColumnAction (HBASE-18419)
> * AddColumnAction::perform (HBASE-18440)
> * ChangeSplitPolicyAction::perform (HBASE-18440)
> * DecreaseMaxHFileSizeAction::perform (HBASE-18440)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19563) A few hbase-procedure classes missing @InterfaceAudience annotation

2017-12-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19563:
--

 Summary: A few hbase-procedure classes missing @InterfaceAudience 
annotation
 Key: HBASE-19563
 URL: https://issues.apache.org/jira/browse/HBASE-19563
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Reporter: Guanghao Zhang
Priority: Minor


NoopProcedureStore.java
ProcedureStoreBase.java
ProcedureMetrics.java
LockStatus.java
LockAndQueue.java
ProcedureStateSerializer.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HBASE-19492) Add EXCLUDE_NAMESPACE and EXCLUDE_TABLECFS support to replication peer config

2017-12-20 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-19492:


> Add EXCLUDE_NAMESPACE and EXCLUDE_TABLECFS support to replication peer config
> -
>
> Key: HBASE-19492
> URL: https://issues.apache.org/jira/browse/HBASE-19492
> Project: HBase
>  Issue Type: Improvement
>    Reporter: Guanghao Zhang
>        Assignee: Guanghao Zhang
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19492.master.001.patch, 
> HBASE-19492.master.002.patch, HBASE-19492.master.002.patch, 
> HBASE-19492.master.002.patch, HBASE-19492.master.003.patch, 
> HBASE-19492.master.004.patch, HBASE-19492.master.005.patch
>
>
> This is a follow-up issue after HBASE-16868. Copied the comments in 
> HBASE-16868.
> This replicate_all flag is useful to avoid misuse of replication peer config. 
> And on our cluster we have more config: EXCLUDE_NAMESPACE and 
> EXCLUDE_TABLECFS for replication peer. Let me tell more about our use case. 
> We have two online serve cluster and one offline cluster for MR/Spark job. 
> For online cluster, all tables will replicate to each other. And not all 
> tables will replicate to offline cluster, because not all tables need OLAP 
> job. We have hundreds of tables and if only one table don't need replicate to 
> offline cluster, then you will config a lot of tables in replication peer 
> config. So we add a new config option is EXCLUDE_TABLECFS. Then you only need 
> config one table (which don't need replicate) in EXCLUDE_TABLECFS.
> Then when the replicate_all flag is false, you can config NAMESPACE or 
> TABLECFS means which namespace/tables need replicate to peer cluster. When 
> replicate_all flag is true, you can config EXCLUDE_NAMESPACE or 
> EXCLUDE_TABLECFS means which namespace/tables can't replicate to peer cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19576) Introduce builder for ReplicationPeerConfig and make it immutable

2017-12-20 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19576:
--

 Summary: Introduce builder for ReplicationPeerConfig and make it 
immutable
 Key: HBASE-19576
 URL: https://issues.apache.org/jira/browse/HBASE-19576
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


Will introduce a new ReplicationPeerConfigBuilder. And deprecated the old set* 
methods in ReplicationPeerConfig. Make the ReplicationPeerConfig we give out be 
immutable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19590) Remove the duplicate code in deprecated ReplicationAdmin

2017-12-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19590:
--

 Summary: Remove the duplicate code in deprecated ReplicationAdmin
 Key: HBASE-19590
 URL: https://issues.apache.org/jira/browse/HBASE-19590
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19591) Cleanup the usage of ReplicationAdmin from hbase-shell

2017-12-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19591:
--

 Summary: Cleanup the usage of ReplicationAdmin from hbase-shell
 Key: HBASE-19591
 URL: https://issues.apache.org/jira/browse/HBASE-19591
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19602) Cleanup the usage of ReplicationAdmin from document

2017-12-22 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19602:
--

 Summary: Cleanup the usage of ReplicationAdmin from document
 Key: HBASE-19602
 URL: https://issues.apache.org/jira/browse/HBASE-19602
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor
 Fix For: 2.0.0-beta-1






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19618) Remove replicationQueuesClient.class/replicationQueues.class config from ReplicationFactory

2017-12-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19618:
--

 Summary: Remove 
replicationQueuesClient.class/replicationQueues.class config from 
ReplicationFactory
 Key: HBASE-19618
 URL: https://issues.apache.org/jira/browse/HBASE-19618
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


When implement the procedure of replication admin operations, we abstract a 
replication storage interface in HBASE-19543. So 
ReplicationQueues/ReplicationQueuesClient are not used anymore. These interface 
are IA.private. So it is ok to remove them. But there are two config: 
hbase.region.replica.replication.replicationQueues.class and 
hbase.region.replica.replication.replicationQueuesClient.class in 
ReplicationFactory. These configs were introduced  by HBASE-15867, which only 
in 2.0. And the feature development is not active now. In the future, we can 
implement the table based replication to replication storage interface. So 
let's remove them before release 2.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19621) Revisit the methods in ReplicationPeerConfigBuilder

2017-12-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19621:
--

 Summary: Revisit the methods in ReplicationPeerConfigBuilder
 Key: HBASE-19621
 URL: https://issues.apache.org/jira/browse/HBASE-19621
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


Add 4 methods for ReplicationPeerConfigBuilder:
addConfiguration
addAllConfiguration
addPeerData
addAllPeerData

Meanwhile, remove setConfiuration and serPeerData from 
ReplicationPeerConfigBuilder.
Because previous ReplicationPeerConfig didn't support setConfiuration and 
serPeerData. And previous code used getConfiguration.put or putAll to add 
configuration. So add methods to keep consistent with old usage.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19622) Reimplement ReplicationPeers with the new replication storage interface

2017-12-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19622:
--

 Summary: Reimplement ReplicationPeers with the new replication 
storage interface
 Key: HBASE-19622
 URL: https://issues.apache.org/jira/browse/HBASE-19622
 Project: HBase
  Issue Type: Bug
  Components: proc-v2, Replication
Reporter: Guanghao Zhang
 Fix For: HBASE-19397






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-17615) Use nonce and procedure v2 for add/remove replication peer

2017-12-25 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-17615.

Resolution: Duplicate

Duplicate with HBASE-19397

> Use nonce and procedure v2 for add/remove replication peer
> --
>
> Key: HBASE-17615
> URL: https://issues.apache.org/jira/browse/HBASE-17615
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Affects Versions: 2.0.0
>    Reporter: Guanghao Zhang
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19630) Add peer cluster key check when add new replication peer

2017-12-26 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19630:
--

 Summary: Add peer cluster key check when add new replication peer
 Key: HBASE-19630
 URL: https://issues.apache.org/jira/browse/HBASE-19630
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2, Replication
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
 Fix For: HBASE-19397






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19636) All rs should already start work with the new peer change when replication peer procedure is finished

2017-12-26 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19636:
--

 Summary: All rs should already start work with the new peer change 
when replication peer procedure is finished
 Key: HBASE-19636
 URL: https://issues.apache.org/jira/browse/HBASE-19636
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


When replication peer operations use zk, the master will modify zk directly. 
Then the rs will asynchronous track the zk event to start work with the new 
peer change. When replication peer operations use procedure, need to make sure 
this process is synchronous. All rs should already start work with the new peer 
change when procedure is finished.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19643) Need to update cache location when get error in AsyncBatchRpcRetryingCaller

2017-12-27 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19643:
--

 Summary: Need to update cache location when get error in 
AsyncBatchRpcRetryingCaller
 Key: HBASE-19643
 URL: https://issues.apache.org/jira/browse/HBASE-19643
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19653) Reduce the default hbase.client.start.log.errors.counter

2017-12-27 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19653:
--

 Summary: Reduce the default hbase.client.start.log.errors.counter
 Key: HBASE-19653
 URL: https://issues.apache.org/jira/browse/HBASE-19653
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


As we reduce the default retries number to 10 and now the default start log 
errors counter is 9. So it only log the error at the last retry. So we should 
reduce the default hbase.client.start.log.errors.counter, too.

{code}
  /**
   * Configure the number of failures after which the client will start 
logging. A few failures
   * is fine: region moved, then is not opened, then is overloaded. We try to 
have an acceptable
   * heuristic for the number of errors we don't log. 9 was chosen because we 
wait for 1s at
   * this stage.
   */
  public static final String START_LOG_ERRORS_AFTER_COUNT_KEY =
  "hbase.client.start.log.errors.counter";
  public static final int DEFAULT_START_LOG_ERRORS_AFTER_COUNT = 9;
{code}

{code}
public static final int [] RETRY_BACKOFF = {1, 2, 3, 5, 10, 20, 40, 100, 100, 
100, 100, 200, 200};
public static final long DEFAULT_HBASE_CLIENT_PAUSE = 100;
{code}

The default pause is 100ms and 100ms * 10 = 1s. The old comment of 
DEFAULT_START_LOG_ERRORS_AFTER_COUNT seems not right...

Open this issue to reduce the default hbase.client.start.log.errors.counter to 
5.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19665) Add table based replication queues storage back

2017-12-29 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19665:
--

 Summary: Add table based replication queues storage back
 Key: HBASE-19665
 URL: https://issues.apache.org/jira/browse/HBASE-19665
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


I removed them in HBASE-19618. So open a issue to track this thing. We should 
add the table based replication queues storage back after we merged HBASE-19397 
to master/branch-2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-17303) Let master to check and transfer the dead rs's replication queues

2017-12-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-17303.

Resolution: Duplicate

Duplicate with HBASE-19633. And this problem will not exist after HBASE-19397.

> Let master to check and transfer the dead rs's replication queues
> -
>
> Key: HBASE-17303
> URL: https://issues.apache.org/jira/browse/HBASE-17303
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Guanghao Zhang
>    Assignee: Guanghao Zhang
>
> Dump replication queues result from our cluster.
> {code}
> Found 8 deleted queues, run hbck -fixReplication in order to remove the 
> deleted replication queues
> hostname,24610,1481528189915/80-hostname,24620,1476784763605
> 
> hostname,24620,1476784763605/70-hostname,24630,1470418208092-hostname,24600,1476773709589
> 
> hostname,24630,1481528526258/17000-hostname,24620,1470044455538-hostname,24630,1470037674231-hostname,24600,1476773708489-hostname,24620,1476784763605
> 
> hostname,24620,1481528358531/70-hostname,24600,1476773709589-hostname,24620,1476784763605
> 
> hostname,24600,1481528021595/70-hostname,24630,1470421093464-hostname,24630,1476773708939-hostname,24610,1476779010928-hostname,24620,1476784747260
> hostname,24600,1481528021595/17000-hostname,24620,1476784763605
> 
> hostname,24600,1481528021595/17000-hostname,24630,1475381530644-hostname,24600,1476773709589-hostname,24620,1476784763605
> 
> hostname,24600,1481528021595/17000-hostname,24600,1476773709589-hostname,24620,1476784763605
> Found 2 dead regionservers, restart one regionserver to transfer the queues 
> of dead regionservers
> hostname,24600,1481547616148
> hostname,24620,1476784763605
> {code}
> Now for dead rs's replication znode, you need restart one regionserver to 
> transfer the replication queues of dead regionservers. Same idea with 
> HBASE-16336, we can let master to periodically check the dead rs znode, too. 
> And send the transfer replication queues request to any regionserver. Then 
> the dead rs's replication queues can be transfer automatically and don't need 
> to wait a regionserver restart. Any suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HBASE-19729) UserScanQueryMatcher#mergeFilterResponse should return INCLUDE_AND_SEEK_NEXT_ROW when filterResponse is INCLUDE_AND_SEEK_NEXT_ROW

2018-01-08 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-19729:


> UserScanQueryMatcher#mergeFilterResponse should return 
> INCLUDE_AND_SEEK_NEXT_ROW when filterResponse is INCLUDE_AND_SEEK_NEXT_ROW
> -
>
> Key: HBASE-19729
> URL: https://issues.apache.org/jira/browse/HBASE-19729
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>  Labels: scanner
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19729.v1.patch, HBASE-19729.v2.patch, 
> HBASE-19729.v3.patch, HBASE-19729.v4.patch, HBASE-19729.v4.patch
>
>
> As we've discussed in HBASE-19696 
> https://issues.apache.org/jira/browse/HBASE-19696?focusedCommentId=16309644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16309644
> when (filterResponse, matchCode) = (INCLUDE_AND_SEEK_NEXT_ROW, INCLUDE) or 
> (INCLUDE_AND_SEEK_NEXT_ROW, INCLUDE_AND_NEXT_COL) ,  we should return 
> INCLUDE_AND_SEEK_NEXT_ROW as the merged match code. 
> Will upload patches for all branches. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19781) Add a new peer state flag for synchronous replication

2018-01-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19781:
--

 Summary: Add a new peer state flag for synchronous replication
 Key: HBASE-19781
 URL: https://issues.apache.org/jira/browse/HBASE-19781
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


The state may be S, DA, or A.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19783) Change replication peer cluster key/endpoint from a not-null value to null is not allowed

2018-01-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19783:
--

 Summary: Change replication peer cluster key/endpoint from a 
not-null value to null is not allowed
 Key: HBASE-19783
 URL: https://issues.apache.org/jira/browse/HBASE-19783
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-19818) Scan time limit not work if the filter always filter row key

2018-01-17 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19818:
--

 Summary: Scan time limit not work if the filter always filter row 
key
 Key: HBASE-19818
 URL: https://issues.apache.org/jira/browse/HBASE-19818
 Project: HBase
  Issue Type: Bug
 Environment: 
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java]

nextInternal() method.

{code}

// Check if rowkey filter wants to exclude this row. If so, loop to next.
 // Technically, if we hit limits before on this row, we don't need this call.
 if (filterRowKey(current)) {
 incrementCountOfRowsFilteredMetric(scannerContext);
 // early check, see HBASE-16296
 if (isFilterDoneInternal()) {
 return 
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
 }
 // Typically the count of rows scanned is incremented inside #populateResult. 
However,
 // here we are filtering a row based purely on its row key, preventing us from 
calling
 // #populateResult. Thus, perform the necessary increment here to rows scanned 
metric
 incrementCountOfRowsScannedMetric(scannerContext);
 boolean moreRows = nextRow(scannerContext, current);
 if (!moreRows) {
 return 
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
 }
 results.clear();
 continue;
 }

// Ok, we are good, let's try to get some results from the main heap.
 populateResult(results, this.storeHeap, scannerContext, current);
 if (scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)) {
 if (hasFilterRow) {
 throw new IncompatibleFilterException(
 "Filter whose hasFilterRow() returns true is incompatible with scans that must 
"
 + " stop mid-row because of a limit. ScannerContext:" + scannerContext);
 }
 return true;
 }

{code}

If filterRowKey always return ture, then it skip to checkAnyLimitReached. For 
batch/size limit, it is ok to skip as we don't read anything. But for time 
limit, it is not right. If the filter always filter row key, we will stuck here 
for a long time.
    Reporter: Guanghao Zhang
    Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19855) Refactor RegionScannerImpl.nextInternal method

2018-01-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19855:
--

 Summary: Refactor RegionScannerImpl.nextInternal method
 Key: HBASE-19855
 URL: https://issues.apache.org/jira/browse/HBASE-19855
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Now this method is too complicated and confusing...

https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19918) Promote TestAsyncClusterAdminApi to LargeTests

2018-02-01 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19918:
--

 Summary: Promote TestAsyncClusterAdminApi to LargeTests
 Key: HBASE-19918
 URL: https://issues.apache.org/jira/browse/HBASE-19918
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 2.0.0-beta-1
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


org.junit.runners.model.TestTimedOutException: test timed out after 180 seconds

Found this timeout in our branch-2 nightly jobs. And this test run more than 
110 seconds on my local computer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19923) Reset peer state and config when refresh replication source failed

2018-02-02 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19923:
--

 Summary: Reset peer state and config when refresh replication 
source failed
 Key: HBASE-19923
 URL: https://issues.apache.org/jira/browse/HBASE-19923
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Now we use procedure for replication. When peer state changed, the RS will read 
peer state from storage to cache. If RS found the peer state changed, then it 
will refresh replication source. If refresh failed, the Master will retry the 
procedure. Then RS will read peer state again, but now the peer state in cache 
is right. So it don't refresh replication source.. So we need reset the 
peer state to old peer state when refresh failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19942) Fix flaky TestSimpleRpcScheduler

2018-02-05 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19942:
--

 Summary: Fix flaky TestSimpleRpcScheduler
 Key: HBASE-19942
 URL: https://issues.apache.org/jira/browse/HBASE-19942
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


[https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html]

 

https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/1387/testReport/junit/org.apache.hadoop.hbase.ipc/TestSimpleRpcScheduler/testSoftAndHardQueueLimits/
 
h3. Stacktrace

java.lang.AssertionError at 
org.apache.hadoop.hbase.ipc.TestSimpleRpcScheduler.testSoftAndHardQueueLimits(TestSimpleRpcScheduler.java:451)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19944) Fix timeout TestVisibilityLabelsWithCustomVisLabService

2018-02-06 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19944:
--

 Summary: Fix timeout TestVisibilityLabelsWithCustomVisLabService
 Key: HBASE-19944
 URL: https://issues.apache.org/jira/browse/HBASE-19944
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


[https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/1404/testReport/junit/org.apache.hadoop.hbase.security.visibility/TestVisibilityLabelsWithCustomVisLabService/testVisibilityLabelsOnRSRestart/]

 
h3. Error Message

test timed out after 6 milliseconds
h3. Stacktrace

org.junit.runners.model.TestTimedOutException: test timed out after 6 
milliseconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19961) Promote TestReplicationAdminWithClusters to LargeTests

2018-02-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19961:
--

 Summary: Promote TestReplicationAdminWithClusters to LargeTests
 Key: HBASE-19961
 URL: https://issues.apache.org/jira/browse/HBASE-19961
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


[https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/1535/testReport/junit/org.apache.hadoop.hbase.client.replication/TestReplicationAdminWithClusters/org_apache_hadoop_hbase_client_replication_TestReplicationAdminWithClusters/]

java.lang.Exception: Appears to be stuck in thread Socket Reader #1 for port 
56518

 

It take 170+ seconds when run it locally.

[INFO] Running 
org.apache.hadoop.hbase.client.replication.TestReplicationAdminWithClusters
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 173.265 
s - in 
org.apache.hadoop.hbase.client.replication.TestReplicationAdminWithClusters



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-19961) Promote TestReplicationAdminWithClusters to LargeTests

2018-02-08 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-19961.

Resolution: Duplicate

Duplicate with HBASE-19952.

> Promote TestReplicationAdminWithClusters to LargeTests
> --
>
> Key: HBASE-19961
> URL: https://issues.apache.org/jira/browse/HBASE-19961
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Guanghao Zhang
>Priority: Major
>
> [https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/1535/testReport/junit/org.apache.hadoop.hbase.client.replication/TestReplicationAdminWithClusters/org_apache_hadoop_hbase_client_replication_TestReplicationAdminWithClusters/]
> java.lang.Exception: Appears to be stuck in thread Socket Reader #1 for port 
> 56518
>  
> It take 170+ seconds when run it locally.
> [INFO] Running 
> org.apache.hadoop.hbase.client.replication.TestReplicationAdminWithClusters
> [INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 173.265 s - in 
> org.apache.hadoop.hbase.client.replication.TestReplicationAdminWithClusters



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-19942) Fix flaky TestSimpleRpcScheduler

2018-02-08 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-19942:


> Fix flaky TestSimpleRpcScheduler
> 
>
> Key: HBASE-19942
> URL: https://issues.apache.org/jira/browse/HBASE-19942
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Guanghao Zhang
>        Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19942.branch-2.001.patch, 
> HBASE-19942.master.001.patch, HBASE-19942.master.addendum.patch
>
>
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html]
>  
> https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/1387/testReport/junit/org.apache.hadoop.hbase.ipc/TestSimpleRpcScheduler/testSoftAndHardQueueLimits/
>  
> h3. Stacktrace
> java.lang.AssertionError at 
> org.apache.hadoop.hbase.ipc.TestSimpleRpcScheduler.testSoftAndHardQueueLimits(TestSimpleRpcScheduler.java:451)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19965:
--

 Summary: Fix flaky TestAsyncRegionAdminApi
 Key: HBASE-19965
 URL: https://issues.apache.org/jira/browse/HBASE-19965
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


See 
[https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]

 

java.lang.AssertionError: expected:<2> but was:<3> at 
org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)

 

Merge regions not work. The table still have 3 regions after the 
MergeRegionsProcedure finished.

The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
for the lock. But after merge regions finished, the MoveRegionProcedure pid=139 
start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new region server. 
This is not right. The MoveRegionProcedure should skip to assign a region which 
was marked as offline. Or we should clear the merged regions' procedure when 
MergeRegionsProcedure finished.

 

Logs:

2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
master.HMaster(1454): balance 
hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
source=cd4730e3eae2,39077,1518106776411, 
destination=cd4730e3eae2,40578,1518106776318 2018-02-08 16:24:44,608 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
procedure2.ProcedureExecutor(868): Stored pid=138, 
state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE;

MergeTableRegionsProcedure table=testMergeRegions, 
regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
forcibly=false

..

2018-02-08 16:24:50,111 INFO [PEWorker-13] procedure2.ProcedureExecutor(1249): 
Finished pid=138, state=SUCCESS; MergeTableRegionsProcedure 
table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false in 5.5710sec 2018-02-08 
16:24:50,113 INFO [PEWorker-13] procedure.MasterProcedureScheduler(813): 
pid=139, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
source=cd4730e3eae2,39077,1518106776411, 
destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-19973) Implement a procedure to replay sync replication wal for standby cluster

2018-02-10 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-19973:
--

 Summary: Implement a procedure to replay sync replication wal for 
standby cluster
 Key: HBASE-19973
 URL: https://issues.apache.org/jira/browse/HBASE-19973
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20163) Disable major compaction when standby cluster replay the remote wals

2018-03-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20163:
--

 Summary: Disable major compaction when standby cluster replay the 
remote wals
 Key: HBASE-20163
 URL: https://issues.apache.org/jira/browse/HBASE-20163
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20524) Need to clear metrics when ReplicationSourceManager refresh replication sources

2018-05-03 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20524:
--

 Summary: Need to clear metrics when ReplicationSourceManager 
refresh replication sources
 Key: HBASE-20524
 URL: https://issues.apache.org/jira/browse/HBASE-20524
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


When ReplicationSourceManager refresh replication sources, it will close the 
old source first, then startup a new source. The new source will use a new 
metrics, but forgot to clear the metrics for old sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20529) Make sure that there are no remote wals when transit cluster from DA to A

2018-05-03 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20529:
--

 Summary: Make sure that there are no remote wals when transit 
cluster from DA to A
 Key: HBASE-20529
 URL: https://issues.apache.org/jira/browse/HBASE-20529
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Guanghao Zhang


Consider we have two clusters in A and S state, and then we transit A to DA. 
And later we want to transit DA to A, since the remote cluster is in S, we 
should be able to do it. But there are some remote wals on the HDFS for the 
cluster in S state, so we need to wait the remote wals was removed first before 
transiting the cluster in DA state to A. Need add a check for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20536) Make TestRegionServerAccounting stable and it should not use absolute number

2018-05-07 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20536:
--

 Summary: Make TestRegionServerAccounting stable and it should not 
use absolute number
 Key: HBASE-20536
 URL: https://issues.apache.org/jira/browse/HBASE-20536
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


TestRegionServerAccounting failed on our internal jenkin job as we config Xmx 
to 10G. We should modify the absolute number to relative value.
{code:java}
new MemStoreSize((3L * 1024L * 1024L * 1024L), (1L * 1024L * 1024L * 1024L), 
0);{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20583) SplitLogWorker should handle FileNotFoundException when split a wal

2018-05-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20583:
--

 Summary: SplitLogWorker should handle FileNotFoundException when 
split a wal
 Key: HBASE-20583
 URL: https://issues.apache.org/jira/browse/HBASE-20583
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


When a split task is finished, master will delete the wal first, then remove 
the task's zk node. So if master crashed after delelte the wal, the zk task 
node may be leaved on zk. When master resubmit this task, the task will failed 
by FileNotFoundException.

We also handle FileNotFoundException in WALSplitter. But not handle this in 
SplitLogWorker.

 
{code:java}
  try {
in = getReader(path, reporter);
  } catch (EOFException e) {
if (length <= 0) {
  // TODO should we ignore an empty, not-last log file if skip.errors
  // is false? Either way, the caller should decide what to do. E.g.
  // ignore if this is the last log in sequence.
  // TODO is this scenario still possible if the log has been
  // recovered (i.e. closed)
  LOG.warn("Could not open {} for reading. File is empty", path, e);
}
// EOFException being ignored
return null;
  }
} catch (IOException e) {
  if (e instanceof FileNotFoundException) {
// A wal file may not exist anymore. Nothing can be recovered so move on
LOG.warn("File {} does not exist anymore", path, e);
return null;
  }
}{code}
{code:java}
// Here fs.getFileStatus may throw FileNotFoundException, too. We should handle 
this exception as the WALSplitter.getReader.
try {
  if (!WALSplitter.splitLogFile(walDir, fs.getFileStatus(new Path(walDir, 
filename)),
fs, conf, p, sequenceIdChecker,
  server.getCoordinatedStateManager().getSplitLogWorkerCoordination(), 
factory)) {
return Status.PREEMPTED;
  }
} 
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20589) Don't need to assign meta to a new RS when standby master become active

2018-05-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20589:
--

 Summary: Don't need to assign meta to a new RS when standby master 
become active
 Key: HBASE-20589
 URL: https://issues.apache.org/jira/browse/HBASE-20589
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


I found this problem when I write ut for HBASE-20569. Now the master  
finishActiveMasterInitialization introduce a new 
RecoverMetaProcedure(HBASE-18261) and it has a sub procedure AssignProcedure. 
AssignProcedure will skip assign a region when regions state is OPEN and server 
is online. But for the new regiog state node is created with state OFFLINE. So 
it will assign the meta to a new RS. And kill the old RS when old RS report to 
master. This will make the master initialization cost a long time. I will 
attatch a ut to show this. FYI [~stack]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20610) Procedure V2 - Distributed Log Splitting

2018-05-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20610:
--

 Summary: Procedure V2 - Distributed Log Splitting
 Key: HBASE-20610
 URL: https://issues.apache.org/jira/browse/HBASE-20610
 Project: HBase
  Issue Type: Umbrella
  Components: proc-v2
Reporter: Guanghao Zhang
 Fix For: 3.0.0


Now master and regionserver use zk to coordinate log split tasks. The split log 
manager manages all log files which need to be scanned and split. Then the 
split log manager places all the logs into the ZooKeeper splitWAL node 
(/hbase/splitWAL) as tasks and monitors these task nodes and waits for them to 
be processed. Each regionserver watch splitWAL znode and grab task when node 
children changed. And regionserver does the work to split the logs.

Open this umbrella issue to move this "coordinate" work to use new procedure v2 
framework and reduce zk depencency. Plan to finish this before 3.0 release. Any 
suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20678) NPE in ReplicationSourceManager#NodeFailoverWorker

2018-06-03 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20678:
--

 Summary: NPE in ReplicationSourceManager#NodeFailoverWorker
 Key: HBASE-20678
 URL: https://issues.apache.org/jira/browse/HBASE-20678
 Project: HBase
  Issue Type: Umbrella
Reporter: Guanghao Zhang


2018-06-04 10:28:43,362 INFO  [ReplicationExecutor-0] 
replication.ZKReplicationQueueStorage(432): Claim queue queueId=1 from 
hao-optiplex-7050,38491,1528079278158 to hao-optiplex-7050,39931,1528079278272 
failed with org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode, someone else took the log?
Exception in thread "ReplicationExecutor-0" java.lang.NullPointerException  


  at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:858)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)


ZKReplicationQueueStorage's claimQueue method may return null when got 
NoNodeException.
{code:java}
  Pair> peer = queueStorage.claimQueue(deadRS,
queues.get(ThreadLocalRandom.current().nextInt(queues.size())), 
server.getServerName());
  long sleep = sleepBeforeFailover / 2;
  if (!peer.getSecond().isEmpty()) {
newQueues.put(peer.getFirst(), peer.getSecond());
sleep = sleepBeforeFailover;
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20698) Master don't record right server version until new started region server call regionServerReport method

2018-06-07 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20698:
--

 Summary: Master don't record right server version until new 
started region server call regionServerReport method
 Key: HBASE-20698
 URL: https://issues.apache.org/jira/browse/HBASE-20698
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.0.0
Reporter: Guanghao Zhang


When a new region server started, it will call regionServerStartup first. 
Master will record this server as a new online server and may dispath 
RemoteProcedure to the new server. But master only record the server version 
when the new region server call regionServerReport method. Dispatch a new 
RemoteProcedure to this new regionserver will fail if version is not right.

{code:java}
  @Override
  protected void remoteDispatch(final ServerName serverName,
  final Set remoteProcedures) {
final int rsVersion = 
master.getAssignmentManager().getServerVersion(serverName);
if (rsVersion >= RS_VERSION_WITH_EXEC_PROCS) {
  LOG.trace("Using procedure batch rpc execution for serverName={} 
version={}",
serverName, rsVersion);
  submitTask(new ExecuteProceduresRemoteCall(serverName, remoteProcedures));
} else {
  LOG.info(String.format(
"Fallback to compat rpc execution for serverName=%s version=%s",
serverName, rsVersion));
  submitTask(new CompatRemoteProcedureResolver(serverName, 
remoteProcedures));
}
  }
{code}

The above code use version to resolve compatibility problem. So dispatch will 
work right for old version region server. But for RefreshPeerProcedure, it is 
new since hbase 2.0. So RefreshPeerProcedure don't need this. But the new 
region server version is not right, it will use CompatRemoteProcedureResolver 
for RefreshPeerProcedure, too. So the RefreshPeerProcedure can't be executed 
rightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-20698) Master don't record right server version until new started region server call regionServerReport method

2018-06-09 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-20698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-20698:


Reopen this as I found another problem... When a region server expired, it will 
be removed from onlineServers. Now getServerVersion may return 0 when the 
server is not in onlineServers. RSProcedureDispatcher is a ServerListener and 
there are race between ServerManager and RSProcedureDispatcher. For a 
RefreshPeerProcedure which target server expired, addOperationToNode may 
succeed but may get version 0 when remoteDispatch. Then this 
RefreshPeerProcedure will fail to dispatch...

> Master don't record right server version until new started region server call 
> regionServerReport method
> ---
>
> Key: HBASE-20698
> URL: https://issues.apache.org/jira/browse/HBASE-20698
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.0.1
>
> Attachments: HBASE-20698.master.001.patch, 
> HBASE-20698.master.002.patch, HBASE-20698.master.003.patch
>
>
> When a new region server started, it will call regionServerStartup first. 
> Master will record this server as a new online server and may dispath 
> RemoteProcedure to the new server. But master only record the server version 
> when the new region server call regionServerReport method. Dispatch a new 
> RemoteProcedure to this new regionserver will fail if version is not right.
> {code:java}
>   @Override
>   protected void remoteDispatch(final ServerName serverName,
>   final Set remoteProcedures) {
> final int rsVersion = 
> master.getAssignmentManager().getServerVersion(serverName);
> if (rsVersion >= RS_VERSION_WITH_EXEC_PROCS) {
>   LOG.trace("Using procedure batch rpc execution for serverName={} 
> version={}",
> serverName, rsVersion);
>   submitTask(new ExecuteProceduresRemoteCall(serverName, 
> remoteProcedures));
> } else {
>   LOG.info(String.format(
> "Fallback to compat rpc execution for serverName=%s version=%s",
> serverName, rsVersion));
>   submitTask(new CompatRemoteProcedureResolver(serverName, 
> remoteProcedures));
> }
>   }
> {code}
> The above code use version to resolve compatibility problem. So dispatch will 
> work right for old version region server. But for RefreshPeerProcedure, it is 
> new since hbase 2.0. So RefreshPeerProcedure don't need this. But the new 
> region server version is not right, it will use CompatRemoteProcedureResolver 
> for RefreshPeerProcedure, too. So the RefreshPeerProcedure can't be executed 
> rightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20709) CompatRemoteProcedureResolver should call remoteCallFailed method instead of throw UnsupportedOperationException

2018-06-09 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20709:
--

 Summary: CompatRemoteProcedureResolver should call 
remoteCallFailed method instead of throw UnsupportedOperationException
 Key: HBASE-20709
 URL: https://issues.apache.org/jira/browse/HBASE-20709
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
{code:java}
@Override
public void dispatchServerOperations(MasterProcedureEnv env, 
List operations) {
  throw new UnsupportedOperationException();
}
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20713) Revisit why to removeFromRunQueue in MasterProcedureExecutor's doPoll method

2018-06-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20713:
--

 Summary: Revisit why to removeFromRunQueue in 
MasterProcedureExecutor's doPoll method 
 Key: HBASE-20713
 URL: https://issues.apache.org/jira/browse/HBASE-20713
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L210
{code:java}
if (rq.isEmpty() || xlockReq) {
  removeFromRunQueue(fairq, rq);
} else if (rq.getLockStatus().hasParentLock(pollResult)) {
  // if the rq is in the fairq because of runnable child
  // check if the next procedure is still a child.
  // if not, remove the rq from the fairq and go back to the xlock state
  Procedure nextProc = rq.peek();
  if (nextProc != null && !Procedure.haveSameParent(nextProc, pollResult)) {
removeFromRunQueue(fairq, rq);
  }
}
{code}
Here is the comment of why to remove from run queue. If I am not wrong, here's 
assumption is the parent procedure should require exclusive lock. So if the 
nextProc is a child but has different parent with current procedure, we can 
remove it from run queue.
But there maybe three type procedure. Procedure A's child is Procedure B. 
Procedure's child is Procedure C. And only Procedure A need exclusive lock and 
Procedure B,C don't require exclusive lock. The 
condition(!Procedure.haveSameParent(nextProc, pollResult)) is not right for 
this case?
FYI [~stack]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-20779) Server version is not right when enable TABLES_ON_MASTER

2018-06-22 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-20779:
--

 Summary: Server version is not right when enable TABLES_ON_MASTER
 Key: HBASE-20779
 URL: https://issues.apache.org/jira/browse/HBASE-20779
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


When eable TABLES_ON_MASTER, master will be a region server to carry regions. 
So it will report to itself, too. And we get server version from rpc call. But 
master report to itself will skip rpc call. Then ServerManager will record a 
wrong version 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-20697) Can't cache All region locations of the specify table by calling table.getRegionLocator().getAllRegionLocations()

2018-07-11 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-20697:


Reopen for fix checkstyle for branch-1.

> Can't cache All region locations of the specify table by calling 
> table.getRegionLocator().getAllRegionLocations()
> -
>
> Key: HBASE-20697
> URL: https://issues.apache.org/jira/browse/HBASE-20697
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.2.6, 2.0.1
>Reporter: zhaoyuan
>Assignee: zhaoyuan
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.4.6, 2.0.2, 2.2.0, 2.1.1
>
> Attachments: HBASE-20697.branch-1.2.001.patch, 
> HBASE-20697.branch-1.2.002.patch, HBASE-20697.branch-1.2.003.patch, 
> HBASE-20697.branch-1.2.004.patch, HBASE-20697.master.001.patch, 
> HBASE-20697.master.002.patch, HBASE-20697.master.002.patch, 
> HBASE-20697.master.003.patch
>
>
> When we upgrade and restart  a new version application which will read and 
> write to HBase, we will get some operation timeout. The time out is expected 
> because when the application restarts，It will not hold any region locations 
> cache and do communication with zk and meta regionserver to get region 
> locations.
> We want to avoid these timeouts so we do warmup work and as far as I am 
> concerned,the method table.getRegionLocator().getAllRegionLocations() will 
> fetch all region locations and cache them. However, it didn't work good. 
> There are still a lot of time outs,so it confused me. 
> I dig into the source code and find something below
> {code:java}
> // code placeholder
> public List getAllRegionLocations() throws IOException {
>   TableName tableName = getName();
>   NavigableMap locations =
>   MetaScanner.allTableRegions(this.connection, tableName);
>   ArrayList regions = new ArrayList<>(locations.size());
>   for (Entry entry : locations.entrySet()) {
> regions.add(new HRegionLocation(entry.getKey(), entry.getValue()));
>   }
>   if (regions.size() > 0) {
> connection.cacheLocation(tableName, new RegionLocations(regions));
>   }
>   return regions;
> }
> In MetaCache
> public void cacheLocation(final TableName tableName, final RegionLocations 
> locations) {
>   byte [] startKey = 
> locations.getRegionLocation().getRegionInfo().getStartKey();
>   ConcurrentMap tableLocations = 
> getTableLocations(tableName);
>   RegionLocations oldLocation = tableLocations.putIfAbsent(startKey, 
> locations);
>   boolean isNewCacheEntry = (oldLocation == null);
>   if (isNewCacheEntry) {
> if (LOG.isTraceEnabled()) {
>   LOG.trace("Cached location: " + locations);
> }
> addToCachedServers(locations);
> return;
>   }
> {code}
> It will collect all regions into one RegionLocations object and only cache 
> the first not null region location and then when we put or get to hbase, we 
> do getCacheLocation() 
> {code:java}
> // code placeholder
> public RegionLocations getCachedLocation(final TableName tableName, final 
> byte [] row) {
>   ConcurrentNavigableMap tableLocations =
> getTableLocations(tableName);
>   Entry e = tableLocations.floorEntry(row);
>   if (e == null) {
> if (metrics!= null) metrics.incrMetaCacheMiss();
> return null;
>   }
>   RegionLocations possibleRegion = e.getValue();
>   // make sure that the end key is greater than the row we're looking
>   // for, otherwise the row actually belongs in the next region, not
>   // this one. the exception case is when the endkey is
>   // HConstants.EMPTY_END_ROW, signifying that the region we're
>   // checking is actually the last region in the table.
>   byte[] endKey = 
> possibleRegion.getRegionLocation().getRegionInfo().getEndKey();
>   if (Bytes.equals(endKey, HConstants.EMPTY_END_ROW) ||
>   getRowComparator(tableName).compareRows(
>   endKey, 0, endKey.length, row, 0, row.length) > 0) {
> if (metrics != null) metrics.incrMetaCacheHit();
> return possibleRegion;
>   }
>   // Passed all the way through, so we got nothing - complete cache miss
>   if (metrics != null) metrics.incrMetaCacheMiss();
>   return null;
> }
> {code}
> It will choose the first location to be possibleRegion and possibly it will 
> miss match.
> So did I forget something or may be wrong somewhere? If this is indeed a bug 
> I think it can be fixed not very hard.
> Hope commiters and PMC review this !
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-13686) Fail to limit rate in RateLimiter

2015-05-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13686:
--

 Summary: Fail to limit rate in RateLimiter
 Key: HBASE-13686
 URL: https://issues.apache.org/jira/browse/HBASE-13686
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 1.1.0
Reporter: Guanghao Zhang
Priority: Minor


While using the patch in HBASE-11598 , I found that RateLimiter can't to limit 
the rate right.
{code} 
 /**
   * given the time interval, are there enough available resources to allow 
execution?
   * @param now the current timestamp
   * @param lastTs the timestamp of the last update
   * @param amount the number of required resources
   * @return true if there are enough available resources, otherwise false
   */
  public synchronized boolean canExecute(final long now, final long lastTs, 
final long amount) {
return avail >= amount ? true : refill(now, lastTs) >= amount;
  }
{code}
When avail >= amount, avail can't be refill. But in the next time to call 
canExecute, lastTs maybe update. So avail will waste some time to refill. Even 
we use smaller rate than the limit, the canExecute will return false. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13829) Add more ThrottleType

2015-06-03 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13829:
--

 Summary: Add more ThrottleType
 Key: HBASE-13829
 URL: https://issues.apache.org/jira/browse/HBASE-13829
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
 Fix For: 2.0.0


HBASE-11598 add simple throttling for hbase. But in the client, it doesn't 
support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, 
READ_SIZE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13888) refill bug from HBASE-13686

2015-06-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13888:
--

 Summary: refill bug from HBASE-13686
 Key: HBASE-13888
 URL: https://issues.apache.org/jira/browse/HBASE-13888
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


As I report the RateLimiter fail to limit in HBASE-13686, then [~ashish singhi] 
fix that problem by support two kinds of RateLimiter:  
AverageIntervalRateLimiter and FixedIntervalRateLimiter. But in my use of the 
code, I found a new bug about refill() in AverageIntervalRateLimiter.
{code}
long delta = (limit * (now - nextRefillTime)) / super.getTimeUnitInMillis();
if (delta > 0) {
  this.nextRefillTime = now;
  return Math.min(limit, available + delta);
}   
{code}
When delta > 0, refill maybe return available + delta. Then in the 
canExecute(), avail will add refillAmount again. So the new avail maybe 2 * 
avail + delta.
{code}
long refillAmount = refill(limit, avail);
if (refillAmount == 0 && avail < amount) {
  return false;
}   
// check for positive overflow
if (avail <= Long.MAX_VALUE - refillAmount) {
  avail = Math.max(0, Math.min(avail + refillAmount, limit));
} else {
  avail = Math.max(0, limit);
} 
{code}
I will add more unit tests for RateLimiter in the next days.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13974) TestRateLimiter#testFixedIntervalResourceAvailability may fail

2015-06-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13974:
--

 Summary: TestRateLimiter#testFixedIntervalResourceAvailability may 
fail
 Key: HBASE-13974
 URL: https://issues.apache.org/jira/browse/HBASE-13974
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


Stacktrace

java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.hbase.quotas.TestRateLimiter.testFixedIntervalResourceAvailability(TestRateLimiter.java:151)

The code of this ut.
{code}
 RateLimiter limiter = new FixedIntervalRateLimiter();
 limiter.set(10, TimeUnit.MILLISECONDS);
 
 assertTrue(limiter.canExecute(10));
 limiter.consume(3);
 assertEquals(7, limiter.getAvailable());
 assertFalse(limiter.canExecute(10));
{code}
The limiter will refill by MILLISECONDS. So if this unit test execute slowly or 
hang by others over 1 ms, the assertFalse(limiter.canExecute(10)) will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13987) Modify the result of shell cmd list_quotas when not enable quota

2015-06-29 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13987:
--

 Summary: Modify the result of shell cmd list_quotas when not 
enable quota
 Key: HBASE-13987
 URL: https://issues.apache.org/jira/browse/HBASE-13987
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0
 Environment: When not enable quota, use shell cmd list_quotas will get 
result as belows:

hbase(main):008:0> list_quotas
OWNERQUOTAS 

 

ERROR: Unknown table hbase:quota!

It is confuse if user doesn't know quotas are stored in hbase:quota. I add 
check isQuotaEnabled before scan the table hbase:quota. So it will return 
result  "ERROR: quota support disabled", which is same with set_quota.
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-21127) TableRecordReader need to handle cursor result too

2018-08-29 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21127:
--

 Summary: TableRecordReader need to handle cursor result too
 Key: HBASE-21127
 URL: https://issues.apache.org/jira/browse/HBASE-21127
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


TableRecordReaderImpl need to handle cursor result too. If not, nextKeyValue 
may return false and miss some data when get a cursor result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21136) Fix failed ut TestMultiTableSnapshotInputFormat

2018-08-31 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21136:
--

 Summary: Fix failed ut TestMultiTableSnapshotInputFormat
 Key: HBASE-21136
 URL: https://issues.apache.org/jira/browse/HBASE-21136
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


See https://builds.apache.org/job/PreCommit-HBASE-Build/14260/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21251) Refactor RegionMover

2018-09-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21251:
--

 Summary: Refactor RegionMover
 Key: HBASE-21251
 URL: https://issues.apache.org/jira/browse/HBASE-21251
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


1. Move connection and admin to RegionMover's member variables. No need create 
connection many times.
2. use try-with-resource to reduce code
3. use ServerName instead of String
4. don't use Deprecated method
5. remove duplicate code
..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21277) prevent to add same table to two sync replication peer's config

2018-10-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21277:
--

 Summary: prevent to add same table to two sync replication peer's 
config
 Key: HBASE-21277
 URL: https://issues.apache.org/jira/browse/HBASE-21277
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


If a table in two sync replication peer's config, it need write wal to three 
places: local dir and two remote dir. It is not allowed. Need to add check when 
add sync replication peer or modify sync replication peer's config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21289) Remove the log "'hbase.regionserver.maxlogs' was deprecated." in AbstractFSWAL

2018-10-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21289:
--

 Summary: Remove the log "'hbase.regionserver.maxlogs' was 
deprecated." in AbstractFSWAL
 Key: HBASE-21289
 URL: https://issues.apache.org/jira/browse/HBASE-21289
 Project: HBase
  Issue Type: Improvement
    Reporter: Guanghao Zhang


This log was added by HBASE-14951. And the description and release note never 
said this config was deprecated. I thought HBASE-14951 only changed the default 
value of maxlogs (Please correct me if I am wrong). And we still use this 
config in our hbase book. So the log "'hbase.regionserver.maxlogs' was 
deprecated." in AbstractFSWAL is confused. Let's remove it. FYI [~vrodionov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21290) No need to instantiate BlockCache for master which not carry table

2018-10-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21290:
--

 Summary: No need to instantiate BlockCache for master which not 
carry table
 Key: HBASE-21290
 URL: https://issues.apache.org/jira/browse/HBASE-21290
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


In our production clusters, we use different jvm config for master/regionserver 
but use same hbase-site.xml for master/regionserver. And master has a small 
heap/offheap config. So the regionserver's hbase.bucketcache.size is not 
suitable for master. I thought we don't need to instantiate BlockCache for 
master which not carry table.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated

2018-10-23 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21365:
--

 Summary: Throw exception when user put data with skip wal to a 
table which may be replicated
 Key: HBASE-21365
 URL: https://issues.apache.org/jira/browse/HBASE-21365
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


A real problem in our production cluster. A user point that his table's data 
can't be replicate to the peer cluster. Then we start to debug the reason. We 
checked the replication scope, checked the replication wal entry filter, and 
check the namespace,tablecfs config. But didn't found any problem. We enabled 
the RS's debug log to find the reason. Finally, we found use use put with skip 
wal to write data. But it taked a long time... Our replication use wal to 
replicate data. So the data can't be replicated to peer cluster. I thought 
throw a exception may be better for user if the table's replication scope is 
not 0. (as 0 means not replicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21366) Optionally ignore edits for deleted columns when replication

2018-10-23 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21366:
--

 Summary: Optionally ignore edits for deleted columns when 
replication
 Key: HBASE-21366
 URL: https://issues.apache.org/jira/browse/HBASE-21366
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


HBASE-12091 introduced a config which can ignore edits for droped tables when 
replication. As we may drop tables in source cluster and peer cluster, too. But 
there are still some edits in wal which is replicated. Same problem when we 
delete columns of a table in source cluster and peer cluster. Replication 
thread will hang by NoSuchColumnException when there are still some edits in 
wal. We can use the same config to ignore the wal edits for deleted columns, 
too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21367) Add table/region/row's statistics output for WALPrettyPrinter

2018-10-23 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21367:
--

 Summary: Add table/region/row's statistics output for 
WALPrettyPrinter
 Key: HBASE-21367
 URL: https://issues.apache.org/jira/browse/HBASE-21367
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


A real case in our production cluster. We found one RS's replication peer 
replicated very slowly. And it is much slower than other RSs. Then we use 
WALPrettyPrinter to output the WAL's edits. And found 90% edits is for same 
row. It was a bug of user's MR job. The job always update same row but 
replicate to a peer cluster very slowly as we need replicate all updates to 
peer cluster. A statistics output for table/region/row will help us to find 
these problems quickly. It looks like as follows.
||Table/Region/Row||edits number||
|t1|x|
|region2|x|
|row3|x|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21385) HTable.delete request use rpc call directly instead of AsyncProcess

2018-10-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21385:
--

 Summary: HTable.delete request use rpc call directly instead of 
AsyncProcess
 Key: HBASE-21385
 URL: https://issues.apache.org/jira/browse/HBASE-21385
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


HBASE-16592 unify delete request to use AsyncProcess. But the job is not done 
totally. As we still use rpc call for get, put, append, and increment. We only 
use AsyncProcess for batch requests. And I found one problem in HBASE-21365. 
The rpc call will throw a DoNotRetryException but AsyncProcess will wrap it 
with a new RetriesExhaustedWithDetailsException. It is not right. So I thought 
HTable.delete should use rpc call directly, it is same with get, put, append 
and increment request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21388) No need to instantiate MemStore for master which not carry table

2018-10-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21388:
--

 Summary: No need to instantiate MemStore for master which not 
carry table
 Key: HBASE-21388
 URL: https://issues.apache.org/jira/browse/HBASE-21388
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


We found this log in our master.

2018-10-26,10:00:00,449 INFO 
[master/c4-hadoop-tst-ct16:42900:becomeActiveMaster] 
org.apache.hadoop.hbase.regionserver.ChunkCreator: Allocating data 
MemStoreChunkPool with chunk size 2 MB, max count 737, initial count 0
2018-10-26,10:00:00,452 INFO 
[master/c4-hadoop-tst-ct16:42900:becomeActiveMaster] 
org.apache.hadoop.hbase.regionserver.ChunkCreator: Allocating index 
MemStoreChunkPool with chunk size 204.80 KB, max count 819, initial count 0

 

Same with HBASE-21290, we don't need to instantiate MemStore for master which 
not carry table.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21420) Use procedure event to wake up the SyncReplicationReplayWALProcedures which wait for worker

2018-11-01 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21420:
--

 Summary: Use procedure event to wake up the 
SyncReplicationReplayWALProcedures which wait for worker
 Key: HBASE-21420
 URL: https://issues.apache.org/jira/browse/HBASE-21420
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


Now if a SyncReplicationReplayWALProcedure failed to get a worker, it will 
sleep backoff and retry. So when the finished SyncReplicationReplayWALProcedure 
release a new worker, it will take a long time to run and get the worker to run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21498) Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a new BlockCache

2018-11-18 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21498:
--

 Summary: Master OOM when SplitTableRegionProcedure new CacheConfig 
and instantiate a new BlockCache
 Key: HBASE-21498
 URL: https://issues.apache.org/jira/browse/HBASE-21498
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


In our cluster, we use a small heap/offheap config for master. After 
HBASE-21290, master doesn't instantiate BlockCache when it not carry table. But 
it will new CacheConfig in SplitTableRegionProcedure.splitStoreFiles method. 
And it will instantiate a new BlockCache if it not initialized before and make 
master OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21514) Refactor CacheConfig

2018-11-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21514:
--

 Summary: Refactor CacheConfig
 Key: HBASE-21514
 URL: https://issues.apache.org/jira/browse/HBASE-21514
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


One basic idea is move the global cache instances from CacheConfig. Only keep 
config stuff in CacheConfig.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21549) Add shell command for serial replication peer

2018-12-04 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21549:
--

 Summary: Add shell command for serial replication peer
 Key: HBASE-21549
 URL: https://issues.apache.org/jira/browse/HBASE-21549
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-05 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21554:
--

 Summary: Show replication endpoint classname for replication peer 
on master web UI
 Key: HBASE-21554
 URL: https://issues.apache.org/jira/browse/HBASE-21554
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21560) Return a new TableDescriptor for MasterObserver#preModifyTable to allow coprocessor modify the TableDescriptor

2018-12-06 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21560:
--

 Summary: Return a new TableDescriptor for 
MasterObserver#preModifyTable to allow coprocessor modify the TableDescriptor
 Key: HBASE-21560
 URL: https://issues.apache.org/jira/browse/HBASE-21560
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


Same with HBASE-21550. The new TableDescriptor is immutable for 2.0+. But in 
our use case, the coprocessor may change the TableDescriptor when 
preModifyTable. It is allowed before 2.0. For 2.0+, We can return a new 
TableDescriptor for MasterObserver#preModifyTable to allow this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21604) Move the memstore chunk creator to HRegionServer's member variable

2018-12-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21604:
--

 Summary: Move the memstore chunk creator to HRegionServer's member 
variable
 Key: HBASE-21604
 URL: https://issues.apache.org/jira/browse/HBASE-21604
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


Same idea with HBASE-21514. Should keep chunk creater in RegionServer level 
instead of JVM process level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-21498) Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a new BlockCache

2018-12-18 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-21498:


Reopen for branch-2.0 and branch-2.1.

> Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a 
> new BlockCache
> --
>
> Key: HBASE-21498
> URL: https://issues.apache.org/jira/browse/HBASE-21498
> Project: HBase
>  Issue Type: Improvement
>        Reporter: Guanghao Zhang
>    Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21498.master.001.patch, 
> HBASE-21498.master.002.patch, HBASE-21498.master.003.patch, 
> HBASE-21498.master.004.patch, HBASE-21498.master.005.patch, 
> HBASE-21498.master.006.patch, HBASE-21498.master.006.patch, 
> HBASE-21498.master.007.patch, HBASE-21498.master.007.patch
>
>
> In our cluster, we use a small heap/offheap config for master. After 
> HBASE-21290, master doesn't instantiate BlockCache when it not carry table. 
> But it will new CacheConfig in SplitTableRegionProcedure.splitStoreFiles 
> method. And it will instantiate a new BlockCache if it not initialized before 
> and make master OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-21498) Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a new BlockCache

2018-12-18 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-21498.

   Resolution: Fixed
Fix Version/s: 2.0.4
   2.1.2

Pushed to branch-2.1 and branch-2.0. Thanks [~stack] for reviewing.

> Master OOM when SplitTableRegionProcedure new CacheConfig and instantiate a 
> new BlockCache
> --
>
> Key: HBASE-21498
> URL: https://issues.apache.org/jira/browse/HBASE-21498
> Project: HBase
>  Issue Type: Improvement
>        Reporter: Guanghao Zhang
>    Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21498.master.001.patch, 
> HBASE-21498.master.002.patch, HBASE-21498.master.003.patch, 
> HBASE-21498.master.004.patch, HBASE-21498.master.005.patch, 
> HBASE-21498.master.006.patch, HBASE-21498.master.006.patch, 
> HBASE-21498.master.007.patch, HBASE-21498.master.007.patch
>
>
> In our cluster, we use a small heap/offheap config for master. After 
> HBASE-21290, master doesn't instantiate BlockCache when it not carry table. 
> But it will new CacheConfig in SplitTableRegionProcedure.splitStoreFiles 
> method. And it will instantiate a new BlockCache if it not initialized before 
> and make master OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21640) Remove the TODO when increment zero

2018-12-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21640:
--

 Summary: Remove the TODO when increment zero
 Key: HBASE-21640
 URL: https://issues.apache.org/jira/browse/HBASE-21640
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


 
{code:java}
// If delta amount to apply is 0, don't write WAL or MemStore.
long deltaAmount = getLongValue(delta);
// TODO: Does zero value mean reset Cell? For example, the ttl.
apply = deltaAmount != 0;
{code}
This is an optimization when increment 0. But it introduced some new problems.

1.As the TODO said, Does zero value mean reset ttl?

2.HBASE-17318 have to introduce a new variable "firstWrite" because it don't 
apply 0.

3. There is a coprocessor method postMutationBeforeWAL to return a new cell. 
But it may be not applied.

 
{code:java}
// Give coprocessors a chance to update the new cell
if (coprocessorHost != null) {
  newCell =
  coprocessorHost.postMutationBeforeWAL(mutationType, mutation, 
currentValue, newCell);
}
// If apply, we need to update memstore/WAL with new value; add it toApply.
if (apply || firstWrite) {
  toApply.add(newCell);
}
{code}
 

So my proposal is remove this optimization. Any suggestions are welcomed.

 

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21643) Introduce two new region coprocessor method and deprecated postMutationBeforeWAL

2018-12-26 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21643:
--

 Summary: Introduce two new region coprocessor method and 
deprecated postMutationBeforeWAL
 Key: HBASE-21643
 URL: https://issues.apache.org/jira/browse/HBASE-21643
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


The old method postMutationBeforeWAL is not accurate about what it do. It is 
only called during increment and append. But the name is "Mutation"... And the 
javadoc only said it will be called by increment...

 
{code:java}
* Called after a new cell has been created during an increment operation, but 
before
* it is committed to the WAL or memstore.
{code}
 

 

We use this coprocessor in our use case. And need add some cells to apply to 
WAL. So I introduced two new method postIncrementBeforeWAL and 
postAppendBeforeWAL to instead of this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21659) Avoid to load duplicate coprocessors in system config and table descriptor

2018-12-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21659:
--

 Summary: Avoid to load duplicate coprocessors in system config and 
table descriptor
 Key: HBASE-21659
 URL: https://issues.apache.org/jira/browse/HBASE-21659
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21660) Apply the cell to right memstore for increment/append operation

2018-12-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21660:
--

 Summary: Apply the cell to right memstore for increment/append 
operation
 Key: HBASE-21660
 URL: https://issues.apache.org/jira/browse/HBASE-21660
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21691) Fix flaky test TestRecoveredEdits

2019-01-07 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21691:
--

 Summary: Fix flaky test TestRecoveredEdits
 Key: HBASE-21691
 URL: https://issues.apache.org/jira/browse/HBASE-21691
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


TestRecoveredEdits failed a lot times in precommit jobs.

https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/master/Flaky_20Test_20Report/

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21695) Fix flaky test TestRegionServerAbortTimeout

2019-01-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21695:
--

 Summary: Fix flaky test TestRegionServerAbortTimeout
 Key: HBASE-21695
 URL: https://issues.apache.org/jira/browse/HBASE-21695
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


[https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/master/Flaky_20Test_20Report/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-21618) Scan with the same startRow(inclusive=true) and stopRow(inclusive=false) returns one result

2019-01-08 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-21618:


Reopen to add a release note.

> Scan with the same startRow(inclusive=true) and stopRow(inclusive=false) 
> returns one result
> ---
>
> Key: HBASE-21618
> URL: https://issues.apache.org/jira/browse/HBASE-21618
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.2
> Environment: hbase server 2.0.2
> hbase client 2.0.0
>Reporter: Jermy Li
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 2.2.0, 2.1.2, 2.0.4, 1.4.10
>
> Attachments: HBASE-21618.branch-1.001.patch, 
> HBASE-21618.master.001.patch, HBASE-21618.master.002.patch, 
> HBASE-21618.master.003.patch
>
>
> I expect the following code to return none result, but still return a row:
> {code:java}
> byte[] rowkey = "some key existed";
> Scan scan = new Scan();
> scan.withStartRow(rowkey, true);
> scan.withStopRow(rowkey, false);
> htable.getScanner(scan);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-21618) Scan with the same startRow(inclusive=true) and stopRow(inclusive=false) returns one result

2019-01-08 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-21618.

Resolution: Fixed

> Scan with the same startRow(inclusive=true) and stopRow(inclusive=false) 
> returns one result
> ---
>
> Key: HBASE-21618
> URL: https://issues.apache.org/jira/browse/HBASE-21618
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.2
> Environment: hbase server 2.0.2
> hbase client 2.0.0
>Reporter: Jermy Li
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.0.4, 2.1.2
>
> Attachments: HBASE-21618.branch-1.001.patch, 
> HBASE-21618.master.001.patch, HBASE-21618.master.002.patch, 
> HBASE-21618.master.003.patch
>
>
> I expect the following code to return none result, but still return a row:
> {code:java}
> byte[] rowkey = "some key existed";
> Scan scan = new Scan();
> scan.withStartRow(rowkey, true);
> scan.withStopRow(rowkey, false);
> htable.getScanner(scan);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-21034) Add new throttle type: read/write capacity unit

2019-01-16 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-21034:


Reopen for branch-2.0 and branch-2.1.

> Add new throttle type: read/write capacity unit
> ---
>
> Key: HBASE-21034
> URL: https://issues.apache.org/jira/browse/HBASE-21034
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21034.branch-2.0.001.patch, 
> HBASE-21034.branch-2.1.001.patch, HBASE-21034.master.001.patch, 
> HBASE-21034.master.002.patch, HBASE-21034.master.003.patch, 
> HBASE-21034.master.004.patch, HBASE-21034.master.005.patch, 
> HBASE-21034.master.006.patch, HBASE-21034.master.006.patch, 
> HBASE-21034.master.007.patch, HBASE-21034.master.007.patch
>
>
> Add new throttle type: read/write capacity unit like DynamoDB.
> One read capacity unit represents that read up to 1K data per time unit. If 
> data size is more than 1K, then consume additional read capacity units.
> One write capacity unit represents that one write for an item up to 1 KB in 
> size per time unit. If data size is more than 1K, then consume additional 
> write capacity units.
> For example, 100 read capacity units per second means that, HBase user can 
> read 100 times for 1K data in every second, or 50 times for 2K data in every 
> second and so on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21798) Cut branch-2.2

2019-01-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-21798:
--

 Summary: Cut branch-2.2
 Key: HBASE-21798
 URL: https://issues.apache.org/jira/browse/HBASE-21798
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


Will cut branch-2.2 from branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

< 1 2 3 4 5 6 7 8 9 >

301 - 400 of 857 matches

Mail list logo