[jira] [Created] (HBASE-21511) Remove in progress snapshot check in SnapshotFileCache#getUnreferencedFiles

2018-11-24 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21511:
--

 Summary: Remove in progress snapshot check in 
SnapshotFileCache#getUnreferencedFiles
 Key: HBASE-21511
 URL: https://issues.apache.org/jira/browse/HBASE-21511
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
 Attachments: 21511.v1.txt

During review of HBASE-21387, [~Apache9] mentioned that the check for in 
progress snapshots in SnapshotFileCache#getUnreferencedFiles is no longer 
needed now that snapshot hfile cleaner and taking snapshot are mutually 
exclusive.

This issue is to address the review comment by removing the check for in 
progress snapshots.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-23 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21387:


> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10
>
> Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, 
> 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, 
> 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, 
> HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, 
> HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, 
> HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, 
> two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.
> Here is timeline given by Josh illustrating the scenario:
> At time T0, we are checking if F1 is referenced. At time T1, there is a 
> snapshot S1 in progress that is referencing a file F1. refreshCache() is 
> called, but no completed snapshot references F1. At T2, the snapshot S1, 
> which references F1, completes. At T3, we check in-progress snapshots and S1 
> is not included. Thus, F1 is marked as unreferenced even though S1 references 
> it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21482) TestHRegion fails due to 'Too many open files'

2018-11-15 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21482:
--

 Summary: TestHRegion fails due to 'Too many open files'
 Key: HBASE-21482
 URL: https://issues.apache.org/jira/browse/HBASE-21482
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


TestHRegion fails due to 'Too many open files' in master branch.
Here is one failed subtest :
{code}
testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion)
  Time elapsed: 2.373 sec  <<< ERROR!
java.lang.IllegalStateException: failed to create a child event loop
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: 
failed to open a new selector
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
Caused by: java.io.IOException: Too many open files
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-14 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21479:
--

 Summary: 
TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
fails with IndexOutOfBoundsException
 Key: HBASE-21479
 URL: https://issues.apache.org/jira/browse/HBASE-21479
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


The test fails in both master branch and branch-2 :
{code}
testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
  Time elapsed: 3.74 sec  <<< ERROR!
java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
at 
org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem as rootdir

2018-11-11 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21466:
--

 Summary: WALProcedureStore uses wrong FileSystem if wal.dir is on 
different FileSystem as rootdir
 Key: HBASE-21466
 URL: https://issues.apache.org/jira/browse/HBASE-21466
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


In WALProcedureStore ctor , the fs field is initialized this way:
{code}
this.fs = walDir.getFileSystem(conf);
{code}
However, when wal.dir is on different FileSystem as rootdir, the above would 
return wrong FileSystem.
In the modified TestMasterProcedureEvents, without fix, the master wouldn't 
initialize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem

2018-11-08 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21457:
--

 Summary: BackupUtils#getWALFilesOlderThan refers to wrong 
FileSystem
 Key: HBASE-21457
 URL: https://issues.apache.org/jira/browse/HBASE-21457
 Project: HBase
  Issue Type: Bug
Reporter: Janos Gub


Janos reported seeing backup test failure when testing a local HDFS for WALs 
while using WASB/ADLS only for store files.

Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase 
root dir for retrieving WAL files.

We should use the helper methods from CommonFSUtils.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21247) Custom Meta WAL Provider doesn't default to custom WAL Provider whose configuration value is outside the enums in Providers

2018-11-06 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21247:


> Custom Meta WAL Provider doesn't default to custom WAL Provider whose 
> configuration value is outside the enums in Providers
> ---
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.2
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.branch-2.patch, 21247.v1.txt, 21247.v10.txt, 
> 21247.v11.txt, 21247.v2.txt, 21247.v3.txt, 21247.v4.tst, 21247.v4.txt, 
> 21247.v5.txt, 21247.v6.txt, 21247.v7.txt, 21247.v8.txt, 21247.v9.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for custom Meta WAL Provider to default to the 
> custom WAL Provider which is supplied by class name.
> This issue fixes the bug by allowing the specification of new WAL Provider 
> class name using the config "hbase.wal.provider".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21438) TestAdmin2#testGetProcedures fails due to FailedProcedure inaccessible

2018-11-05 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21438:
--

 Summary: TestAdmin2#testGetProcedures fails due to FailedProcedure 
inaccessible
 Key: HBASE-21438
 URL: https://issues.apache.org/jira/browse/HBASE-21438
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


>From 
>https://builds.apache.org/job/HBase-Flaky-Tests/job/master/1863/testReport/org.apache.hadoop.hbase.client/TestAdmin2/testGetProcedures/
> :
{code}
Mon Nov 05 04:52:13 UTC 2018, RpcRetryingCaller{globalStartTime=1541393533029, 
pause=250, maxAttempts=7}, 
org.apache.hadoop.hbase.procedure2.BadProcedureException: 
org.apache.hadoop.hbase.procedure2.BadProcedureException: The procedure class 
org.apache.hadoop.hbase.procedure2.FailedProcedure must be accessible and have 
an empty constructor
 at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.validateClass(ProcedureUtil.java:82)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProtoProcedure(ProcedureUtil.java:162)
 at 
org.apache.hadoop.hbase.master.MasterRpcServices.getProcedures(MasterRpcServices.java:1249)
 at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21416) Intermittent TestRegionInfoDisplay failure due to shift in relTime of RegionState#toDescriptiveString

2018-10-31 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21416:
--

 Summary: Intermittent TestRegionInfoDisplay failure due to shift 
in relTime of RegionState#toDescriptiveString
 Key: HBASE-21416
 URL: https://issues.apache.org/jira/browse/HBASE-21416
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


Over 
https://builds.apache.org/job/HBase-Flaky-Tests/job/branch-2.1/1799/testReport/junit/org.apache.hadoop.hbase.client/TestRegionInfoDisplay/testRegionDetailsForDisplay/
 :
{code}
org.junit.ComparisonFailure: expected:<...:30 UTC 2018 (PT0.00[6]S ago), 
server=null> but was:<...:30 UTC 2018 (PT0.00[7]S ago), server=null>
at 
org.apache.hadoop.hbase.client.TestRegionInfoDisplay.testRegionDetailsForDisplay(TestRegionInfoDisplay.java:78)
{code}
Here is how toDescriptiveString composes relTime:
{code}
long relTime = System.currentTimeMillis() - stamp;
{code}
In the test, state.toDescriptiveString() is called twice for the assertion 
where different return values from System.currentTimeMillis() caused the 
assertion to fail in the above occasion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21180) findbugs incurs DataflowAnalysisException for hbase-server module

2018-10-29 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-21180.

Resolution: Cannot Reproduce

> findbugs incurs DataflowAnalysisException for hbase-server module
> -
>
> Key: HBASE-21180
> URL: https://issues.apache.org/jira/browse/HBASE-21180
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Priority: Minor
>
> Running findbugs, I noticed the following in hbase-server module:
> {code}
> [INFO] --- findbugs-maven-plugin:3.0.4:findbugs (default-cli) @ hbase-server 
> ---
> [INFO] Fork Value is true
>  [java] The following errors occurred during analysis:
>  [java]   Error generating derefs for 
> org.apache.hadoop.hbase.generated.master.table_jsp._jspService(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
>  [java] edu.umd.cs.findbugs.ba.DataflowAnalysisException: can't get 
> position -1 of stack
>  [java]   At 
> edu.umd.cs.findbugs.ba.Frame.getStackValue(Frame.java:250)
>  [java]   At 
> edu.umd.cs.findbugs.ba.Hierarchy.resolveMethodCallTargets(Hierarchy.java:743)
>  [java]   At 
> edu.umd.cs.findbugs.ba.npe.DerefFinder.getAnalysis(DerefFinder.java:141)
>  [java]   At 
> edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:50)
>  [java]   At 
> edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:31)
>  [java]   At 
> edu.umd.cs.findbugs.classfile.impl.AnalysisCache.analyzeMethod(AnalysisCache.java:369)
>  [java]   At 
> edu.umd.cs.findbugs.classfile.impl.AnalysisCache.getMethodAnalysis(AnalysisCache.java:322)
>  [java]   At 
> edu.umd.cs.findbugs.ba.ClassContext.getMethodAnalysis(ClassContext.java:1005)
>  [java]   At 
> edu.umd.cs.findbugs.ba.ClassContext.getUsagesRequiringNonNullValues(ClassContext.java:325)
>  [java]   At 
> edu.umd.cs.findbugs.detect.FindNullDeref.foundGuaranteedNullDeref(FindNullDeref.java:1510)
>  [java]   At 
> edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.reportBugs(NullDerefAndRedundantComparisonFinder.java:361)
>  [java]   At 
> edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.examineNullValues(NullDerefAndRedundantComparisonFinder.java:266)
>  [java]   At 
> edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.execute(NullDerefAndRedundantComparisonFinder.java:164)
>  [java]   At 
> edu.umd.cs.findbugs.detect.FindNullDeref.analyzeMethod(FindNullDeref.java:278)
>  [java]   At 
> edu.umd.cs.findbugs.detect.FindNullDeref.visitClassContext(FindNullDeref.java:209)
>  [java]   At 
> edu.umd.cs.findbugs.DetectorToDetector2Adapter.visitClass(DetectorToDetector2Adapter.java:76)
>  [java]   At 
> edu.umd.cs.findbugs.FindBugs2.analyzeApplication(FindBugs2.java:1089)
>  [java]   At edu.umd.cs.findbugs.FindBugs2.execute(FindBugs2.java:283)
>  [java]   At edu.umd.cs.findbugs.FindBugs.runMain(FindBugs.java:393)
>  [java]   At edu.umd.cs.findbugs.FindBugs2.main(FindBugs2.java:1200)
>  [java] The following classes needed for analysis were missing:
>  [java]   accept
>  [java]   apply
>  [java]   run
>  [java]   test
>  [java]   call
>  [java]   exec
>  [java]   getAsInt
>  [java]   applyAsLong
>  [java]   storeFile
>  [java]   get
>  [java]   visit
>  [java]   compare
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21387) Race condition in snapshot cache refreshing leads to loss of snapshot files

2018-10-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21387:
--

 Summary: Race condition in snapshot cache refreshing leads to loss 
of snapshot files
 Key: HBASE-21387
 URL: https://issues.apache.org/jira/browse/HBASE-21387
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


During recent report from customer where ExportSnapshot failed:
{code}
2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
snapshot.SnapshotReferenceUtil: Can't find hfile: 
44f6c3c646e84de6a63fe30da4fcb3aa in the real 
(hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 or archive 
(hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 directory for the primary table. 
{code}
We found the following in log:
{code}
2018-10-09 18:54:23,675 DEBUG 
[00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
cleaner.HFileCleaner: Removing: 
hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
from archive
{code}
The root cause is race condition surrounding SnapshotFileCache#refreshCache().
There are two callers of refreshCache: one from RefreshCacheTask#run and the 
other from SnapshotHFileCleaner.
Let's look at the code of refreshCache:
{code}
// if the snapshot directory wasn't modified since we last check, we are 
done
if (dirStatus.getModificationTime() <= this.lastModifiedTime) return;

// 1. update the modified time
this.lastModifiedTime = dirStatus.getModificationTime();

// 2.clear the cache
this.cache.clear();
{code}
Suppose the RefreshCacheTask runs past the if check and sets 
this.lastModifiedTime
The cleaner executes refreshCache and returns immediately since 
this.lastModifiedTime matches the modification time of the directory.
Now RefreshCacheTask clears the cache. By the time the cleaner performs cache 
lookup, the cache is empty.
Therefore cleaner puts the file into unReferencedFiles - leading to data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21318) Make RefreshHFilesClient runnable

2018-10-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21318:


> Make RefreshHFilesClient runnable
> -
>
> Key: HBASE-21318
> URL: https://issues.apache.org/jira/browse/HBASE-21318
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0, 1.5.0, 2.1.2
>Reporter: Tak Lon (Stephen) Wu
>Assignee: Tak Lon (Stephen) Wu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-21318.master.001.patch, 
> HBASE-21318.master.002.patch, HBASE-21318.master.003.patch, 
> HBASE-21318.master.004.patch
>
>
> Other than when user enables hbase.coprocessor.region.classes with 
> RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI 
> and calls refresh HFiles directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21149) TestIncrementalBackupWithBulkLoad may fail due to file copy failure

2018-10-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-21149.

   Resolution: Duplicate
Fix Version/s: (was: 3.0.0)

> TestIncrementalBackupWithBulkLoad may fail due to file copy failure
> ---
>
> Key: HBASE-21149
> URL: https://issues.apache.org/jira/browse/HBASE-21149
> Project: HBase
>  Issue Type: Test
>  Components: backuprestore
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Attachments: 21149.v2.txt, HBASE-21149-v1.patch, 
> testIncrementalBackupWithBulkLoad-output.txt
>
>
> From 
> https://builds.apache.org/job/HBase%20Nightly/job/master/471/testReport/junit/org.apache.hadoop.hbase.backup/TestIncrementalBackupWithBulkLoad/TestIncBackupDeleteTable/
>  :
> {code}
> 2018-09-03 11:54:30,526 ERROR [Time-limited test] 
> impl.TableBackupClient(235): Unexpected Exception : Failed copy from 
> hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_
>  to hdfs://localhost:53075/backupUT/backup_1535975655488
> java.io.IOException: Failed copy from 
> hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_
>  to hdfs://localhost:53075/backupUT/backup_1535975655488
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:351)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.copyBulkLoadedFiles(IncrementalTableBackupClient.java:219)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.handleBulkLoad(IncrementalTableBackupClient.java:198)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:320)
>   at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605)
>   at 
> org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable(TestIncrementalBackupWithBulkLoad.java:104)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> {code}
> However, some part of the test output was lost:
> {code}
> 2018-09-03 11:53:36,793 DEBUG [RS:0;765c9ca5ea28:36357] regions
> ...[truncated 398396 chars]...
> 8)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21381) Document the hadoop versions using which backup and restore feature works

2018-10-24 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21381:
--

 Summary: Document the hadoop versions using which backup and 
restore feature works
 Key: HBASE-21381
 URL: https://issues.apache.org/jira/browse/HBASE-21381
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


HADOOP-15850 fixes a bug where CopyCommitter#concatFileChunks unconditionally 
tried to concatenate the files being DistCp'ed to target cluster (though the 
files are independent).

Following is the log snippet of the failed concatenation attempt:
{code}
2018-10-13 14:09:25,351 WARN  [Thread-936] mapred.LocalJobRunner$Job(590): 
job_local1795473782_0004
java.io.IOException: Inconsistent sequence file: current chunk file 
org.apache.hadoop.tools.CopyListingFileStatus@bb8826ee{hdfs://localhost:42796/user/hbase/test-data/
   
160aeab5-6bca-9f87-465e-2517a0c43119/data/default/test-1539439707496/96b5a3613d52f4df1ba87a1cef20684c/f/a7599081e835440eb7bf0dd3ef4fd7a5_SeqId_205_
 length = 5100 aclEntries  = null, xAttrs = null} doesnt match prior entry 
org.apache.hadoop.tools.CopyListingFileStatus@243d544d{hdfs://localhost:42796/user/hbase/test-data/160aeab5-6bca-9f87-465e-
   
2517a0c43119/data/default/test-1539439707496/96b5a3613d52f4df1ba87a1cef20684c/f/394e6d39a9b94b148b9089c4fb967aad_SeqId_205_
 length = 5142 aclEntries = null, xAttrs = null}
  at 
org.apache.hadoop.tools.mapred.CopyCommitter.concatFileChunks(CopyCommitter.java:276)
  at 
org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:100)
  at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:567)
{code}
Backup and Restore uses DistCp to transfer files between clusters.
Without the fix from HADOOP-15850, the transfer would fail.

This issue is to document the hadoop versions which contain HADOOP-15850 so 
that user of Backup and Restore feature knows which hadoop versions they can 
use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport

2018-10-20 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21353:
--

 Summary: TestHBCKCommandLineParsing#testCommandWithOptions hangs 
on call to HBCK2#checkHBCKSupport
 Key: HBASE-21353
 URL: https://issues.apache.org/jira/browse/HBASE-21353
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


I noticed the following when running 
TestHBCKCommandLineParsing#testCommandWithOptions :
{code}
"main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on 
condition [0x70216000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00076d3055d8> (a 
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564)
at 
org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229)
at 
org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown
 Source)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347)
at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227)
at 
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127)
at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93)
at org.apache.hbase.HBCK2.run(HBCK2.java:352)
at 
org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62)
{code}
The test doesn't spin up hbase cluster.
Hence the call to check hbck support hangs.

In HBCK2#run, we can refactor the code such that argument parsing is done prior 
to calling HBCK2#checkHBCKSupport .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21281) Update bouncycastle dependency.

2018-10-19 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21281:


> Update bouncycastle dependency.
> ---
>
> Key: HBASE-21281
> URL: https://issues.apache.org/jira/browse/HBASE-21281
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, test
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21281.addendum.patch, 21281.addendum2.patch, 
> HBASE-21281.001.branch-2.0.patch
>
>
> Looks like we still depend on bcprov-jdk16 for some x509 certificate 
> generation in our tests. Bouncycastle has moved beyond this in 1.47, changing 
> the artifact names.
> [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later]
> There are some API changes too, but it looks like we don't use any of these.
> It seems like we also have vestiges in the POMs from when we were depending 
> on a specific BC version that came in from Hadoop. We now have a 
> KeyStoreTestUtil class in HBase, which makes me think we can also clean up 
> some dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21341) DeadServer shouldn't import unshaded Preconditions

2018-10-18 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21341:
--

 Summary: DeadServer shouldn't import unshaded Preconditions
 Key: HBASE-21341
 URL: https://issues.apache.org/jira/browse/HBASE-21341
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


DeadServer currently imports unshaded Preconditions :
{code}
import com.google.common.base.Preconditions;
{code}
We should import shaded version of Preconditions.

This is the only place where unshaded class from com.google.common is imported



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21279) Split TestAdminShell into several tests

2018-10-08 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21279:
--

 Summary: Split TestAdminShell into several tests
 Key: HBASE-21279
 URL: https://issues.apache.org/jira/browse/HBASE-21279
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


In the flaky test board, TestAdminShell often timed out 
(https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html).

I ran the test on Linux with SSD and reproduced the timeout (see attached test 
output).
{code}
2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting 
hbase.rootdir to 
/mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9
...
2018-10-08 02:49:09,093 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] 
master.MasterRpcServices(1171): Checking to see if procedure is done pid=871
Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] 
util.FSTableDescriptors(684): Wrote into 
hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-   
d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01
2018-10-08 02:49:09,328 INFO  
[RegionOpenAndInitThread-hbase_shell_tests_table-1] regionserver.HRegion(7004): 
creating HRegion hbase_shell_tests_table HTD == 
'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE 
=> 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', 
  CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', 
TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 
'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', 
CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', 
COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'},  {NAME 
=> 'y', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR 
=> 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',  
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
'false', IN_MEMORY => 'false',  CACHE_BLOOMS_ON_WRITE => 'false', 
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/
user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == 
hbase_shell_tests_table
^[[38;5;226mE^[[0m
===
Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: 
Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: 
CREATE, Table Name:  default:hbase_shell_tests_table, procId: 871
2018-10-08 02:49:09,361 INFO  [Block report processor] 
blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 
127.0.0.1:41338 is added to   
blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, 
replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1:
   41338|RBW]]} size 58
> TEST TIMED OUT. PRINTING THREAD DUMP. <
{code}
We can see that the procedure #871 wasn't stuck - the timeout cut in and 
stopped the test.

We should separate the current test into two (or more) test files (with 
corresponding .rb) so that the execution time consistently would not exceed 
limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21272) Re-add assertions for RS Group admin tests

2018-10-05 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21272:
--

 Summary: Re-add assertions for RS Group admin tests
 Key: HBASE-21272
 URL: https://issues.apache.org/jira/browse/HBASE-21272
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 1.5.0


The checked in version of HBASE-21258 for branch-1 didn't include assertions 
for adding / removing RS group coprocessor hook calls.

This issue is to add the assertions to corresponding tests in TestRSGroupsAdmin1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-10-01 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21221:


> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 21221.addendum.txt, 21221.v10.txt, 21221.v11.txt, 
> 21221.v12.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21261) Add log4j.properties for hbase-rsgroup tests

2018-09-30 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21261:
--

 Summary: Add log4j.properties for hbase-rsgroup tests
 Key: HBASE-21261
 URL: https://issues.apache.org/jira/browse/HBASE-21261
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


When I tried to debug TestRSGroups, at first I couldn't find any DEBUG log.
Turns out that under hbase-rsgroup/src/test/resources there is no 
log4j.properties

This issue adds log4j.properties for hbase-rsgroup tests.

This would be useful when finding root cause for hbase-rsgroup test failure(s).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-29 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21207:


> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 21207.branch-1.addendum.patch, 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, 
> HBASE-21207-branch-1.patch, HBASE-21207-branch-1.v1.patch, 
> HBASE-21207-branch-2.v1.patch, HBASE-21207.patch, HBASE-21207.patch, 
> HBASE-21207.v1.patch, edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-09-29 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21258:
--

 Summary: Add resetting of flags for RS Group pre/post hooks in 
TestRSGroups
 Key: HBASE-21258
 URL: https://issues.apache.org/jira/browse/HBASE-21258
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu


Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
Group pre/post hooks in TestRSGroups was absent.

This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-27 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21247:
--

 Summary: Allow WAL Provider to be specified by configuration 
without explicit enum in Providers
 Key: HBASE-21247
 URL: https://issues.apache.org/jira/browse/HBASE-21247
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 21247.v1.txt

Currently all the WAL Providers acceptable to hbase are specified in Providers 
enum of WALFactory.
This restricts the ability for additional WAL Providers to be supplied - by 
class name.

This issue introduces additional config which allows the specification of new 
WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21246) Introduce WALIdentity interface

2018-09-27 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21246:
--

 Summary: Introduce WALIdentity interface
 Key: HBASE-21246
 URL: https://issues.apache.org/jira/browse/HBASE-21246
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Ted Yu


We are introducing WALIdentity interface so that the WAL representation can be 
decoupled from distributed filesystem.

The interface provides getName method whose return value can represent filename 
in distributed filesystem environment or, the name of the stream when the WAL 
is backed by log stream.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21238) MapReduceHFileSplitterJob#run shouldn't call System.exit

2018-09-26 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21238:
--

 Summary: MapReduceHFileSplitterJob#run shouldn't call System.exit
 Key: HBASE-21238
 URL: https://issues.apache.org/jira/browse/HBASE-21238
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


{code}
if (args.length < 2) {
  usage("Wrong number of arguments: " + args.length);
  System.exit(-1);
{code}
Correct way of handling error condition is through return value of run method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21230) BackupUtils#checkTargetDir doesn't compose error message correctly

2018-09-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21230:
--

 Summary: BackupUtils#checkTargetDir doesn't compose error message 
correctly
 Key: HBASE-21230
 URL: https://issues.apache.org/jira/browse/HBASE-21230
 Project: HBase
  Issue Type: Bug
  Components: backuprestore
Reporter: Ted Yu


Here is related code:
{code}
  String expMsg = e.getMessage();
  String newMsg = null;
  if (expMsg.contains("No FileSystem for scheme")) {
newMsg =
"Unsupported filesystem scheme found in the backup target url. 
Error Message: "
+ newMsg;
{code}
I think the intention was to concatenate expMsg at the end of newMsg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-16627) AssignmentManager#isDisabledorDisablingRegionInRIT should check whether table exists

2018-09-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-16627.

Resolution: Later

> AssignmentManager#isDisabledorDisablingRegionInRIT should check whether table 
> exists
> 
>
> Key: HBASE-16627
> URL: https://issues.apache.org/jira/browse/HBASE-16627
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Stephen Yuan Jiang
>Priority: Minor
>
> [~stack] first reported this issue when he played with backup feature.
> The following exception can be observed in backup unit tests:
> {code}
> 2016-09-13 16:21:57,661 ERROR [ProcedureExecutor-3] 
> master.TableStateManager(134): Unable to get table hbase:backup state
> org.apache.hadoop.hbase.TableNotFoundException: hbase:backup
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:174)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:131)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.isDisabledorDisablingRegionInRIT(AssignmentManager.java:1221)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:739)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1567)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1546)
> at 
> org.apache.hadoop.hbase.util.ModifyRegionUtils.assignRegions(ModifyRegionUtils.java:254)
> at 
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.assignRegions(CreateTableProcedure.java:430)
> at 
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:127)
> at 
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:57)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:452)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1066)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:855)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:808)
> {code}
> AssignmentManager#isDisabledorDisablingRegionInRIT should take table 
> existence into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-22 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21221:
--

 Summary: Ineffective assertion in 
TestFromClientSide3#testMultiRowMutations
 Key: HBASE-21221
 URL: https://issues.apache.org/jira/browse/HBASE-21221
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


Observed the following in 
org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
{code}
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
089bdfa75f44d88e596479038a6da18b
  at 
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
  at 
org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
  at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
  at 
org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
  at 
org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
  at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
...
Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
should fail because the target lock is blocked by previous put
  at org.junit.Assert.fail(Assert.java:88)
  at 
org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
{code}
Here is related code:
{code}
  cpService.execute(() -> {
...
if (!threw) {
  // Can't call fail() earlier because the catch would eat it.
  fail("This cp should fail because the target lock is blocked by 
previous put");
}
{code}
Since the fail() call is executed by the cpService, the assertion had no 
bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21216) TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky

2018-09-20 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21216:
--

 Summary: TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky
 Key: HBASE-21216
 URL: https://issues.apache.org/jira/browse/HBASE-21216
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/HBase-Flaky-Tests/job/branch-2/794/testReport/junit/org.apache.hadoop.hbase.master.cleaner/TestSnapshotFromMaster/testSnapshotHFileArchiving/
> :
{code}
java.lang.AssertionError: Archived hfiles [] and table hfiles 
[9ca09392705f425f9c916beedc10d63c] is missing snapshot 
file:6739a09747e54189a4112a6d8f37e894
at 
org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster.testSnapshotHFileArchiving(TestSnapshotFromMaster.java:370)
{code}
The file appeared in archive dir before hfile cleaners were run:
{code}
2018-09-20 10:38:53,187 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|-archive/
2018-09-20 10:38:53,188 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|data/
2018-09-20 10:38:53,189 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|---default/
2018-09-20 10:38:53,190 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|--test/
2018-09-20 10:38:53,191 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|-1237d57b63a7bdf067a930441a02514a/
2018-09-20 10:38:53,192 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|recovered.edits/
2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(774): 
|---4.seqid
2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|-29e1700e09b51223ad2f5811105a4d51/
2018-09-20 10:38:53,194 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|fam/
2018-09-20 10:38:53,195 DEBUG [Time-limited test] util.CommonFSUtils(774): 
|---2c66a18f6c1a4074b84ffbb3245268c4
2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): 
|---45bb396c6a5e49629e45a4d56f1e9b14
2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): 
|---6739a09747e54189a4112a6d8f37e894
{code}
However, the archive dir became empty after hfile cleaners were run:
{code}
2018-09-20 10:38:53,312 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|-archive/
2018-09-20 10:38:53,313 DEBUG [Time-limited test] util.CommonFSUtils(771): 
|-corrupt/
{code}
Leading to the assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21198) Exclude dependency on net.minidev:json-smart

2018-09-14 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21198:
--

 Summary: Exclude dependency on net.minidev:json-smart
 Key: HBASE-21198
 URL: https://issues.apache.org/jira/browse/HBASE-21198
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt
> :
{code}
[ERROR] Failed to execute goal on project hbase-common: Could not resolve 
dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: 
Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 -> 
org.apache.hadoop:hadoop-auth:jar:3.0.0 -> 
com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> 
net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor for 
net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact 
net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon 
(https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied to: 
https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom
 , ReasonPhrase:Forbidden. -> [Help 1]
{code}
We should exclude dependency on net.minidev:json-smart

hbase-common/bin/pom.xml has done so.

The other pom.xml should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21194) Add TestCopyTable which exercises MOB feature

2018-09-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21194:
--

 Summary: Add TestCopyTable which exercises MOB feature
 Key: HBASE-21194
 URL: https://issues.apache.org/jira/browse/HBASE-21194
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


Currently TestCopyTable doesn't cover table(s) with MOB feature enabled.

We should add variant that enables MOB on the table being copied and verify 
that MOB content is copied correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21180) findbugs incurs DataflowAnalysisException for hbase-server module

2018-09-10 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21180:
--

 Summary: findbugs incurs DataflowAnalysisException for 
hbase-server module
 Key: HBASE-21180
 URL: https://issues.apache.org/jira/browse/HBASE-21180
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


Running findbugs, I noticed the following in hbase-server module:
{code}
[INFO] --- findbugs-maven-plugin:3.0.4:findbugs (default-cli) @ hbase-server ---
[INFO] Fork Value is true
 [java] The following errors occurred during analysis:
 [java]   Error generating derefs for 
org.apache.hadoop.hbase.generated.master.table_jsp._jspService(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
 [java] edu.umd.cs.findbugs.ba.DataflowAnalysisException: can't get 
position -1 of stack
 [java]   At edu.umd.cs.findbugs.ba.Frame.getStackValue(Frame.java:250)
 [java]   At 
edu.umd.cs.findbugs.ba.Hierarchy.resolveMethodCallTargets(Hierarchy.java:743)
 [java]   At 
edu.umd.cs.findbugs.ba.npe.DerefFinder.getAnalysis(DerefFinder.java:141)
 [java]   At 
edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:50)
 [java]   At 
edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:31)
 [java]   At 
edu.umd.cs.findbugs.classfile.impl.AnalysisCache.analyzeMethod(AnalysisCache.java:369)
 [java]   At 
edu.umd.cs.findbugs.classfile.impl.AnalysisCache.getMethodAnalysis(AnalysisCache.java:322)
 [java]   At 
edu.umd.cs.findbugs.ba.ClassContext.getMethodAnalysis(ClassContext.java:1005)
 [java]   At 
edu.umd.cs.findbugs.ba.ClassContext.getUsagesRequiringNonNullValues(ClassContext.java:325)
 [java]   At 
edu.umd.cs.findbugs.detect.FindNullDeref.foundGuaranteedNullDeref(FindNullDeref.java:1510)
 [java]   At 
edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.reportBugs(NullDerefAndRedundantComparisonFinder.java:361)
 [java]   At 
edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.examineNullValues(NullDerefAndRedundantComparisonFinder.java:266)
 [java]   At 
edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.execute(NullDerefAndRedundantComparisonFinder.java:164)
 [java]   At 
edu.umd.cs.findbugs.detect.FindNullDeref.analyzeMethod(FindNullDeref.java:278)
 [java]   At 
edu.umd.cs.findbugs.detect.FindNullDeref.visitClassContext(FindNullDeref.java:209)
 [java]   At 
edu.umd.cs.findbugs.DetectorToDetector2Adapter.visitClass(DetectorToDetector2Adapter.java:76)
 [java]   At 
edu.umd.cs.findbugs.FindBugs2.analyzeApplication(FindBugs2.java:1089)
 [java]   At edu.umd.cs.findbugs.FindBugs2.execute(FindBugs2.java:283)
 [java]   At edu.umd.cs.findbugs.FindBugs.runMain(FindBugs.java:393)
 [java]   At edu.umd.cs.findbugs.FindBugs2.main(FindBugs2.java:1200)
 [java] The following classes needed for analysis were missing:
 [java]   accept
 [java]   apply
 [java]   run
 [java]   test
 [java]   call
 [java]   exec
 [java]   getAsInt
 [java]   applyAsLong
 [java]   storeFile
 [java]   get
 [java]   visit
 [java]   compare
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21175) Partially initialized SnapshotHFileCleaner leads to NPE during TestHFileArchiving

2018-09-09 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21175:
--

 Summary: Partially initialized SnapshotHFileCleaner leads to NPE 
during TestHFileArchiving
 Key: HBASE-21175
 URL: https://issues.apache.org/jira/browse/HBASE-21175
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


TestHFileArchiving#testCleaningRace creates HFileCleaner instance within the 
test.
When SnapshotHFileCleaner.init() is called, there is no master parameter passed 
in {{params}}.

When the chore runs the cleaner during the test, NPE comes out of this line in 
getDeletableFiles():
{code}
  return cache.getUnreferencedFiles(files, master.getSnapshotManager());
{code}
since master is null.

We should either check for the null master or, pass master instance properly 
when constructing the cleaner instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21129) Clean up duplicate codes in #equals and #hashCode methods of Filter

2018-09-06 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-21129.

Resolution: Fixed

> Clean up duplicate codes in #equals and #hashCode methods of Filter
> ---
>
> Key: HBASE-21129
> URL: https://issues.apache.org/jira/browse/HBASE-21129
> Project: HBase
>  Issue Type: Improvement
>  Components: Filters
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Minor
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21129.addendum, HBASE-21129.master.001.patch, 
> HBASE-21129.master.002.patch, HBASE-21129.master.003.patch, 
> HBASE-21129.master.004.patch, HBASE-21129.master.005.patch, 
> HBASE-21129.master.006.patch, HBASE-21129.master.007.patch, 
> HBASE-21129.master.008.patch
>
>
> It is a follow-up of HBASE-19008, aiming to clean up duplicate codes in 
> #equals and #hashCode methods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21160) Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored

2018-09-06 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21160:
--

 Summary: Assertion in 
TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels 
is ignored
 Key: HBASE-21160
 URL: https://issues.apache.org/jira/browse/HBASE-21160
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/PreCommit-HBASE-Build/14327/artifact/patchprocess/diff-compile-javac-hbase-server.txt
> (HBASE-21138 QA run):
{code}
[WARNING] 
/testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDeletes.java:[315,25]
 [AssertionFailureIgnored] This assertion throws an AssertionError if it fails, 
which will be caught by an enclosing try block.
{code}
Here is related code:
{code}
  PrivilegedExceptionAction scanAction = new 
PrivilegedExceptionAction() {
@Override
public Void run() throws Exception {
  try (Connection connection = ConnectionFactory.createConnection(conf);
...
assertEquals(1, next.length);
  } catch (Throwable t) {
throw new IOException(t);
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21150) Avoid delay in first flushes due to overheads in table metrics registration

2018-09-04 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21150:


I  didn't open this issue for backporting.

HBASE-15728 is still in master and the delay in first flushes is still there.

> Avoid delay in first flushes due to overheads in table metrics registration
> ---
>
> Key: HBASE-21150
> URL: https://issues.apache.org/jira/browse/HBASE-21150
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21150.v1.txt, 21150.v2.txt, 21150.v3.txt
>
>
> After HBASE-15728 is integrated, the lazy table metrics registration results 
> in penalty for the first flushes.
> Excerpt from log shows delay (note the same timestamp 08:18:23,234) :
> {code:java}
> 2018-09-02 08:18:23,232 DEBUG 
> [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
> regionserver.MetricsTableSourceImpl(124): Creating new  
> MetricsTableSourceImpl for table 'testtb-1535901500805'
> 2018-09-02 08:18:23,233 DEBUG 
> [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
> regionserver.MetricsTableSourceImpl(137): registering metrics for testtb-   
> 1535901500805
> 2018-09-02 08:18:23,234 INFO  
> [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] 
> regionserver.HRegion(2822): Finished flush of dataSize ~2.29 KB/2343,   
> heapSize ~5.16 KB/5280, currentSize=0 B/0 for 
> fa403f6a4fb8dbc1a1c389744fce2d58 in 280ms, sequenceid=5, compaction 
> requested=false
> 2018-09-02 08:18:23,234 DEBUG 
> [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1] 
> regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register  
> testtb-1535901500805 
> Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1,5,FailOnTimeoutGroup]
> 2018-09-02 08:18:23,234 DEBUG 
> [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] 
> regionserver.MetricsTableAggregateSourceImpl(84): it took 0 ms to register  
> testtb-1535901500805 
> Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1,5,FailOnTimeoutGroup]
> 2018-09-02 08:18:23,234 DEBUG 
> [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1] 
> regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register   
> testtb-1535901500805 
> Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1,5,FailOnTimeoutGroup]
> 2018-09-02 08:18:23,234 DEBUG 
> [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2] 
> regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register   
> testtb-1535901500805 
> Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2,5,FailOnTimeoutGroup]
> 2018-09-02 08:18:23,234 DEBUG 
> [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2] 
> regionserver.MetricsTableAggregateSourceImpl(84): it took 5 ms to register  
> testtb-1535901500805 
> Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2,5,FailOnTimeoutGroup]
> 2018-09-02 08:18:23,234 DEBUG 
> [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
> regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register  
> testtb-1535901500805 
> Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2,5,FailOnTimeoutGroup]
> {code}
> This is a regression.
> When first region of the table is opened on region server, we can proactively 
> register table metrics.
> This would avoid the penalty on first flushes for the table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21150) Avoid delay in first flushes due to contention in table metrics registration

2018-09-04 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21150:
--

 Summary: Avoid delay in first flushes due to contention in table 
metrics registration
 Key: HBASE-21150
 URL: https://issues.apache.org/jira/browse/HBASE-21150
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


After HBASE-15728 is integrated, the lazy table metrics registration results in 
penalty for the first flushes.
Excerpt from log shows delay (note the same timestamp 08:18:23,234) :
{code}
2018-09-02 08:18:23,232 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
regionserver.MetricsTableSourceImpl(124): Creating new  
MetricsTableSourceImpl for table 'testtb-1535901500805'
2018-09-02 08:18:23,233 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
regionserver.MetricsTableSourceImpl(137): registering metrics for testtb-   
1535901500805
2018-09-02 08:18:23,234 INFO  
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] 
regionserver.HRegion(2822): Finished flush of dataSize ~2.29 KB/2343,   
heapSize ~5.16 KB/5280, currentSize=0 B/0 for fa403f6a4fb8dbc1a1c389744fce2d58 
in 280ms, sequenceid=5, compaction requested=false
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 0 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register   
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register   
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 5 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2,5,FailOnTimeoutGroup]
{code}
This is a regression.

When first region of the table is opened on region server, we can proactively 
register table metrics.
This would avoid the penalty on first flushes for the table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21149) TestIncrementalBackupWithBulkLoad may fail due to file copy failure

2018-09-04 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21149:
--

 Summary: TestIncrementalBackupWithBulkLoad may fail due to file 
copy failure
 Key: HBASE-21149
 URL: https://issues.apache.org/jira/browse/HBASE-21149
 Project: HBase
  Issue Type: Test
  Components: backuprestore
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/HBase%20Nightly/job/master/471/testReport/junit/org.apache.hadoop.hbase.backup/TestIncrementalBackupWithBulkLoad/TestIncBackupDeleteTable/
> :
{code}
2018-09-03 11:54:30,526 ERROR [Time-limited test] impl.TableBackupClient(235): 
Unexpected Exception : Failed copy from 
hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_
 to hdfs://localhost:53075/backupUT/backup_1535975655488
java.io.IOException: Failed copy from 
hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_
 to hdfs://localhost:53075/backupUT/backup_1535975655488
at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:351)
at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.copyBulkLoadedFiles(IncrementalTableBackupClient.java:219)
at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.handleBulkLoad(IncrementalTableBackupClient.java:198)
at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:320)
at 
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605)
at 
org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable(TestIncrementalBackupWithBulkLoad.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
{code}
However, some part of the test output was lost:
{code}
2018-09-03 11:53:36,793 DEBUG [RS:0;765c9ca5ea28:36357] regions
...[truncated 398396 chars]...
8)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21141) Enable MOB in backup / restore test involving incremental backup

2018-09-02 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21141:
--

 Summary: Enable MOB in backup / restore test involving incremental 
backup
 Key: HBASE-21141
 URL: https://issues.apache.org/jira/browse/HBASE-21141
 Project: HBase
  Issue Type: Test
  Components: backuprestore
Reporter: Ted Yu


Currently we only have one test (TestRemoteBackup) where MOB feature is 
enabled. The test only performs full backup.

This issue is to enable MOB in backup / restore test(s) involving incremental 
backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21139) Concurrent invocations of MetricsTableAggregateSourceImpl.getOrCreateTableSource may return unregistered MetricsTableSource

2018-09-01 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21139:
--

 Summary: Concurrent invocations of 
MetricsTableAggregateSourceImpl.getOrCreateTableSource may return unregistered 
MetricsTableSource
 Key: HBASE-21139
 URL: https://issues.apache.org/jira/browse/HBASE-21139
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


>From test output of TestRestoreFlushSnapshotFromClient :
{code}
2018-09-01 21:09:38,174 WARN  [member: 
'hw13463.attlocal.net,49623,1535861370108' subprocedure-pool6-thread-1] 
snapshot.  
RegionServerSnapshotManager$SnapshotSubprocedurePool(348): Got Exception in 
SnapshotSubprocedurePool
java.util.concurrent.ExecutionException: java.lang.NullPointerException
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at 
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:324)
  at 
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
  at 
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.updateFlushTime(MetricsTableSourceImpl.java:375)
  at 
org.apache.hadoop.hbase.regionserver.MetricsTable.updateFlushTime(MetricsTable.java:56)
  at 
org.apache.hadoop.hbase.regionserver.MetricsRegionServer.updateFlush(MetricsRegionServer.java:210)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2826)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2416)
  at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2306)
  at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2209)
  at 
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:115)
  at 
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
{code}
In MetricsTableAggregateSourceImpl.getOrCreateTableSource :
{code}
MetricsTableSource prev = tableSources.putIfAbsent(table, source);

if (prev != null) {
  return prev;
} else {
  // register the new metrics now
  register(source);
{code}
Suppose threads t1 and t2 execute the above code concurrently.
t1 calls putIfAbsent first and proceeds to running {{register(source)}}.
Context switches, t2 gets to putIfAbsent and retrieves the instance stored by 
t1 which is not registered yet.
We would end up with what the stack trace showed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21138) Close HRegion instance at the end of every test in TestHRegion

2018-08-31 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21138:
--

 Summary: Close HRegion instance at the end of every test in 
TestHRegion
 Key: HBASE-21138
 URL: https://issues.apache.org/jira/browse/HBASE-21138
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


TestHRegion has over 100 tests.
The following is from one subtest:
{code}
  public void testCompactionAffectedByScanners() throws Exception {
byte[] family = Bytes.toBytes("family");
this.region = initHRegion(tableName, method, CONF, family);
{code}
this.region is not closed at the end of the subtest.

testToShowNPEOnRegionScannerReseek is another example.

Every subtest should use the following construct toward the end:
{code}
} finally {
  HBaseTestingUtility.closeRegionAndWAL(this.region);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1

2018-08-28 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-14783.

Resolution: Later

> Proc-V2: Master aborts when downgrading from 1.3 to 1.1
> ---
>
> Key: HBASE-14783
> URL: https://issues.apache.org/jira/browse/HBASE-14783
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Stephen Yuan Jiang
>Priority: Major
>
> I was running ITBLL with 1.3 deployed on a 6 node cluster.
> Then I stopped the cluster, deployed 1.1 release and tried to start cluster.
> However, master failed to start due to:
> {code}
> 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] 
> master.HMaster: Failed to become active master
> java.io.IOException: The procedure class 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be 
> accessible and have an empty constructor
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548)
>   at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:191)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536)
>   ... 12 more
> {code}
> The cause was that ServerCrashProcedure, written in some WAL file under 
> MasterProcWALs from first run, was absent in 1.1 release.
> After a brief discussion with Stephen, I am logging this JIRA to solicit 
> discussion on how customer experience can be improved if downgrade of hbase 
> is performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-14716) Detection of orphaned table znode should cover table in Enabled state

2018-08-28 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-14716.

Resolution: Later

> Detection of orphaned table znode should cover table in Enabled state
> -
>
> Key: HBASE-14716
> URL: https://issues.apache.org/jira/browse/HBASE-14716
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: hbck
> Attachments: 14716-branch-1-v1.txt, 14716.branch-1.v4.txt
>
>
> HBASE-12070 introduced fix for orphaned table znode where table doesn't have 
> entry in hbase:meta
> When Stephen and I investigated rolling upgrade failure,
> {code}
> 2015-10-27 18:21:10,668 WARN  [ProcedureExecutorThread-3] 
> procedure.CreateTableProcedure: The table smoketest does not exist in meta 
> but has a znode. run hbck to fix inconsistencies.
> {code}
> we found that the orphaned table znode corresponded to table in Enabled state.
> Therefore running hbck didn't report the inconsistency.
> Detection for orphaned table znode should cover this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21097) Flush pressure assertion may fail in testFlushThroughputTuning

2018-08-22 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21097:
--

 Summary: Flush pressure assertion may fail in 
testFlushThroughputTuning 
 Key: HBASE-21097
 URL: https://issues.apache.org/jira/browse/HBASE-21097
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/PreCommit-HBASE-Build/14137/artifact/patchprocess/patch-unit-hbase-server.txt
> :
{code}
[ERROR] 
testFlushThroughputTuning(org.apache.hadoop.hbase.regionserver.throttle.TestFlushWithThroughputController)
  Time elapsed: 17.446 s  <<< FAILURE!
java.lang.AssertionError: expected:<0.0> but was:<1.2906294173808417E-6>
at 
org.apache.hadoop.hbase.regionserver.throttle.TestFlushWithThroughputController.testFlushThroughputTuning(TestFlushWithThroughputController.java:185)
{code}
Here is the related assertion:
{code}
assertEquals(0.0, regionServer.getFlushPressure(), EPSILON);
{code}
where EPSILON = 1E-6

In the above case, due to margin of 2.9E-7, the assertion didn't pass.
It seems the epsilon can be adjusted to accommodate different workload / 
hardware combination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21088) HStoreFile should be closed in HStore#hasReferences

2018-08-21 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21088:
--

 Summary: HStoreFile should be closed in HStore#hasReferences
 Key: HBASE-21088
 URL: https://issues.apache.org/jira/browse/HBASE-21088
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


{code}
  reloadedStoreFiles = loadStoreFiles();
  return StoreUtils.hasReferences(reloadedStoreFiles);
{code}
The intention of obtaining the HStoreFile's is to check for references.
The loaded HStoreFile's should be closed prior to return to prevent leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21076) TestTableResource fails with NPE

2018-08-20 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21076:
--

 Summary: TestTableResource fails with NPE
 Key: HBASE-21076
 URL: https://issues.apache.org/jira/browse/HBASE-21076
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


The following can be observed in master branch:
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.rest.TestTableResource.setUpBeforeClass(TestTableResource.java:134)
{code}
The NPE comes from the following in TestEndToEndSplitTransaction :
{code}
compactAndBlockUntilDone(TEST_UTIL.getAdmin(),
  TEST_UTIL.getMiniHBaseCluster().getRegionServer(0), 
daughterA.getRegionName());
{code}
Initial check of the code shows that TestEndToEndSplitTransaction uses 
TEST_UTIL instance which is created within TestEndToEndSplitTransaction. 
However, TestTableResource creates its own instance of HBaseTestingUtility.
Meaning TEST_UTIL.getMiniHBaseCluster() would return null, since the instance 
created by TestEndToEndSplitTransaction has hbaseCluster as null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21042) processor.getRowsToLock() always assumes there is some row being locked in HRegion#processRowsWithLocks

2018-08-13 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21042:
--

 Summary: processor.getRowsToLock() always assumes there is some 
row being locked in HRegion#processRowsWithLocks
 Key: HBASE-21042
 URL: https://issues.apache.org/jira/browse/HBASE-21042
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


[~tdsilva] reported at the tail of HBASE-18998 that the fix for HBASE-18998 
missed finally block of HRegion#processRowsWithLocks

This is to fix that remaining call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21040) printStackTrace() is used in RestoreDriver in case Exception is caught

2018-08-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21040:
--

 Summary: printStackTrace() is used in RestoreDriver in case 
Exception is caught
 Key: HBASE-21040
 URL: https://issues.apache.org/jira/browse/HBASE-21040
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


Here is related code:
{code}
} catch (Exception e) {
  e.printStackTrace();
{code}
The correct way of logging stack trace is to use the Logger instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20988) TestShell shouldn't be skipped for hbase-shell module test

2018-07-31 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20988:
--

 Summary: TestShell shouldn't be skipped for hbase-shell module test
 Key: HBASE-20988
 URL: https://issues.apache.org/jira/browse/HBASE-20988
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


Here is snippet for QA run 13862 for HBASE-20985 :
{code}
13:42:50 cd /testptch/hbase/hbase-shell
13:42:50 /usr/share/maven/bin/mvn 
-Dmaven.repo.local=/home/jenkins/yetus-m2/hbase-master-patch-1 
-DHBasePatchProcess -PrunAllTests -Dtest.exclude.pattern=**/master.normalizer.  
  
TestSimpleRegionNormalizerOnCluster.java,**/replication.regionserver.TestSerialReplicationEndpoint.java,**/master.procedure.TestServerCrashProcedure.java,**/master.procedure.TestCreateTableProcedure.

java,**/TestClientOperationTimeout.java,**/client.TestSnapshotFromClientWithRegionReplicas.java,**/master.TestAssignmentManagerMetrics.java,**/client.TestShell.java,**/client.

TestCloneSnapshotFromClientWithRegionReplicas.java,**/master.TestDLSFSHLog.java,**/replication.TestReplicationSmallTestsSync.java,**/master.procedure.TestModifyTableProcedure.java,**/regionserver.
   
TestCompactionInDeadRegionServer.java,**/client.TestFromClientSide3.java,**/master.procedure.TestRestoreSnapshotProcedure.java,**/client.TestRestoreSnapshotFromClient.java,**/security.access.

TestCoprocessorWhitelistMasterObserver.java,**/replication.regionserver.TestDrainReplicationQueuesForStandBy.java,**/master.procedure.TestProcedurePriority.java,**/master.locking.TestLockProcedure.
  
java,**/master.cleaner.TestSnapshotFromMaster.java,**/master.assignment.TestSplitTableRegionProcedure.java,**/client.TestMobRestoreSnapshotFromClient.java,**/replication.TestReplicationKillSlaveRS.
  
java,**/regionserver.TestHRegion.java,**/security.access.TestAccessController.java,**/master.procedure.TestTruncateTableProcedure.java,**/client.TestAsyncReplicationAdminApiWithClusters.java,**/
 
coprocessor.TestMetaTableMetrics.java,**/client.TestMobSnapshotCloneIndependence.java,**/namespace.TestNamespaceAuditor.java,**/master.TestMasterAbortAndRSGotKilled.java,**/client.TestAsyncTable.java,**/master.TestMasterOperationsForRegionReplicas.java,**/util.TestFromClientSide3WoUnsafe.java,**/client.TestSnapshotCloneIndependence.java,**/client.TestAsyncDecommissionAdminApi.java,**/client.

TestRestoreSnapshotFromClientWithRegionReplicas.java,**/master.assignment.TestMasterAbortWhileMergingTable.java,**/client.TestFromClientSide.java,**/client.TestAdmin1.java,**/client.
 
TestFromClientSideWithCoprocessor.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/master.procedure.TestMasterFailoverWithProcedures.java,**/regionserver.
TestSplitTransactionOnCluster.java clean test -fae > 
/testptch/patchprocess/patch-unit-hbase-shell.txt 2>&1
{code}
In this case, there was modification to shell script, leading to running shell 
tests.

However, TestShell was excluded in the QA run, defeating the purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20968) list_procedures_test fails due to no matching regex

2018-07-28 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20968:
--

 Summary: list_procedures_test fails due to no matching regex
 Key: HBASE-20968
 URL: https://issues.apache.org/jira/browse/HBASE-20968
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


>From test output against hadoop3:
{code}
2018-07-28 12:04:24,838 DEBUG [Time-limited test] 
procedure2.ProcedureExecutor(948): Stored pid=12, state=RUNNABLE, 
hasLock=false; org.apache.hadoop.hbase.client.procedure.  ShellTestProcedure
2018-07-28 12:04:24,864 INFO  [RS-EventLoopGroup-1-3] 
ipc.ServerRpcConnection(556): Connection from 172.18.128.12:46918, 
version=3.0.0-SNAPSHOT, sasl=false, ugi=hbase (auth: SIMPLE), 
service=MasterService
2018-07-28 12:04:24,900 DEBUG [Thread-114] master.MasterRpcServices(1157): 
Checking to see if procedure is done pid=11
^[[38;5;196mF^[[0m
===
Failure: 
^[[48;5;124;38;5;231;1mtest_list_procedures(Hbase::ListProceduresTest)^[[0m
src/test/ruby/shell/list_procedures_test.rb:65:in `block in 
test_list_procedures'
 62: end
 63:   end
 64:
^[[48;5;124;38;5;231;1m  => 65:   assert_equal(1, matching_lines)^[[0m
 66: end
 67:   end
 68: end
<^[[48;5;34;38;5;231;1m1^[[0m> expected but was
<^[[48;5;124;38;5;231;1m0^[[0m>
===
...
2018-07-28 12:04:25,374 INFO  [PEWorker-9] procedure2.ProcedureExecutor(1316): 
Finished pid=12, state=SUCCESS, hasLock=false; 
org.apache.hadoop.hbase.client.procedure.   ShellTestProcedure in 
336msec
{code}
The completion of the ShellTestProcedure was after the assertion was raised.
{code}
def create_procedure_regexp(table_name)
  regexp_string = '[0-9]+ .*ShellTestProcedure SUCCESS.*' \
{code}
The regex used by the test isn't found in test output either.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20966) RestoreTool#getTableInfoPath should look for completed snapshot only

2018-07-27 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20966:
--

 Summary: RestoreTool#getTableInfoPath should look for completed 
snapshot only
 Key: HBASE-20966
 URL: https://issues.apache.org/jira/browse/HBASE-20966
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


[~gubjanos] reported seeing the following error when running backup / restore 
test on Azure:
{code}
2018-07-25 17:03:56,661|INFO|MainThread|machine.py:167 - 
run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException:
 Couldn't read snapshot info 
from:wasb://hbase3-m30wub1711kond-115...@humbtesting8wua.blob.core.windows.net/user/hbase/backup_loc/backup_1532538064246/default/table_fnfawii1za/.hbase-snapshot/.tmp/.
snapshotinfo
2018-07-25 17:03:56,661|INFO|MainThread|machine.py:167 - 
run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at 
org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:328)
2018-07-25 17:03:56,661|INFO|MainThread|machine.py:167 - 
run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at 
org.apache.hadoop.hbase.backup.util.RestoreServerUtil.getTableDesc(RestoreServerUtil.java:237)
2018-07-25 17:03:56,662|INFO|MainThread|machine.py:167 - 
run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at 
org.apache.hadoop.hbase.backup.util.RestoreServerUtil.restoreTableAndCreate(RestoreServerUtil.java:351)
2018-07-25 17:03:56,662|INFO|MainThread|machine.py:167 - 
run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at 
org.apache.hadoop.hbase.backup.util.RestoreServerUtil.fullRestoreTable(RestoreServerUtil.java:186)
{code}
Here is related code in master branch:
{code}
  Path getTableInfoPath(TableName tableName) throws IOException {
Path tableSnapShotPath = getTableSnapshotPath(backupRootPath, tableName, 
backupId);
Path tableInfoPath = null;

// can't build the path directly as the timestamp values are different
FileStatus[] snapshots = fs.listStatus(tableSnapShotPath);
{code}
In the above code, we don't exclude incomplete snapshot, leading to exception 
later when reading snapshot info.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20917) MetaTableMetrics#stop references uninitialized requestsMap for non-meta region

2018-07-21 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20917:
--

 Summary: MetaTableMetrics#stop references uninitialized 
requestsMap for non-meta region
 Key: HBASE-20917
 URL: https://issues.apache.org/jira/browse/HBASE-20917
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


I noticed the following in test output:
{code}
2018-07-21 15:54:43,181 ERROR [RS_CLOSE_REGION-regionserver/172.17.5.4:0-1] 
executor.EventHandler(186): Caught throwable while processing event 
M_RS_CLOSE_REGION
java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.coprocessor.MetaTableMetrics.stop(MetaTableMetrics.java:329)
  at 
org.apache.hadoop.hbase.coprocessor.BaseEnvironment.shutdown(BaseEnvironment.java:91)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionEnvironment.shutdown(RegionCoprocessorHost.java:165)
  at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.shutdown(CoprocessorHost.java:290)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.postEnvCall(RegionCoprocessorHost.java:559)
  at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:622)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postClose(RegionCoprocessorHost.java:551)
  at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1678)
  at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1484)
  at 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
{code}
{{requestsMap}} is only initialized for the meta region.
However, check for meta region is absent in the stop method:
{code}
  public void stop(CoprocessorEnvironment e) throws IOException {
// since meta region can move around, clear stale metrics when stop.
for (String meterName : requestsMap.keySet()) {
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20892) [UI] Start / End keys are empty on table.jsp

2018-07-15 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20892:
--

 Summary: [UI] Start / End keys are empty on table.jsp
 Key: HBASE-20892
 URL: https://issues.apache.org/jira/browse/HBASE-20892
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1
Reporter: Ted Yu


When viewing table.jsp?name=TestTable , I found that the Start / End keys for 
all the regions were simply dashes without real value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20879) Compacting memstore config should handle lower case

2018-07-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20879:
--

 Summary: Compacting memstore config should handle lower case
 Key: HBASE-20879
 URL: https://issues.apache.org/jira/browse/HBASE-20879
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1
Reporter: Tushar Sharma
Assignee: Ted Yu


Tushar reported seeing the following in region server log when entering 'basic' 
for compacting memstore type:
{code}
2018-07-10 19:43:45,944 ERROR [RS_OPEN_REGION-regionserver/c01s22:16020-0] 
handler.OpenRegionHandler: Failed open of 
region=usertable,user6379,1531182972304.69abd81a44e9cc3ef9e150709f4f69ab., 
starting to roll back the global memstore size.
java.io.IOException: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.hbase.MemoryCompactionPolicy.basic
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1035)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:900)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:872)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7048)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7006)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6977)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6933)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6884)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:109)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.hbase.MemoryCompactionPolicy.basic
at java.lang.Enum.valueOf(Enum.java:238)
at 
org.apache.hadoop.hbase.MemoryCompactionPolicy.valueOf(MemoryCompactionPolicy.java:26)
at 
org.apache.hadoop.hbase.regionserver.HStore.getMemstore(HStore.java:331)
at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:271)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5531)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:999)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:996)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
2018-07-10 19:43:45,944 ERROR [RS_OPEN_REGION-regionserver/c01s22:16020-1] 
handler.OpenRegionHandler: Failed open of 
region=temp,,1530511278693.0be48eedc68b9358aa475946d00571f1., starting to roll 
back the global memstore size.
java.io.IOException: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.hbase.MemoryCompactionPolicy.basic
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1035)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:900)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:872)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7048)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7006)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6977)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6933)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6884)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:109)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.hbase.MemoryCompactionPolicy.basic
at java.lang.Enum.valueOf(Enum.java:238)
at 
org.apache.hadoop.hbase.MemoryCompactionPolicy.valueOf(MemoryCompactionPolicy.java:26)
at 

[jira] [Created] (HBASE-20744) Address FindBugs warnings in branch-1

2018-06-16 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20744:
--

 Summary: Address FindBugs warnings in branch-1
 Key: HBASE-20744
 URL: https://issues.apache.org/jira/browse/HBASE-20744
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/HBase%20Nightly/job/branch-1/350//JDK8_Nightly_Build_Report_(Hadoop2)/
> :
{code}
FindBugsmodule:hbase-common
Inconsistent synchronization of 
org.apache.hadoop.hbase.io.encoding.EncodedDataBlock$BufferGrabbingByteArrayOutputStream.ourBytes;
 locked 50% of time Unsynchronized access at EncodedDataBlock.java:50% of time 
Unsynchronized access at EncodedDataBlock.java:[line 258]
{code}
{code}
FindBugsmodule:hbase-hadoop2-compat
java.util.concurrent.ScheduledThreadPoolExecutor stored into non-transient 
field MetricsExecutorImpl$ExecutorSingleton.scheduler At 
MetricsExecutorImpl.java:MetricsExecutorImpl$ExecutorSingleton.scheduler At 
MetricsExecutorImpl.java:[line 51]
{code}
{code}
FindBugsmodule:hbase-server
instanceof will always return false in 
org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, 
int, int), since a org.apache.hadoop.hbase.quotas.RpcThrottlingException can't 
be a org.apache.hadoop.hbase.quotas.ThrottlingException At 
RegionServerQuotaManager.java:in 
org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, 
int, int), since a org.apache.hadoop.hbase.quotas.RpcThrottlingException can't 
be a org.apache.hadoop.hbase.quotas.ThrottlingException At 
RegionServerQuotaManager.java:[line 193]
instanceof will always return true for all non-null values in 
org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, 
int, int), since all org.apache.hadoop.hbase.quotas.RpcThrottlingException are 
instances of org.apache.hadoop.hbase.quotas.RpcThrottlingException At 
RegionServerQuotaManager.java:for all non-null values in 
org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, 
int, int), since all org.apache.hadoop.hbase.quotas.RpcThrottlingException are 
instances of org.apache.hadoop.hbase.quotas.RpcThrottlingException At 
RegionServerQuotaManager.java:[line 199]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20743) ASF License warnings for branch-1

2018-06-16 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20743:
--

 Summary: ASF License warnings for branch-1
 Key: HBASE-20743
 URL: https://issues.apache.org/jira/browse/HBASE-20743
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


>From 
>https://builds.apache.org/job/HBase%20Nightly/job/branch-1/350/artifact/output-general/patch-asflicense-problems.txt
> :
{code}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? hbase-error-prone/target/checkstyle-result.xml
 !? 
hbase-error-prone/target/classes/META-INF/services/com.google.errorprone.bugpatterns.BugChecker
 !? 
hbase-error-prone/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst
 !? 
hbase-error-prone/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst
{code}
Looks like they should be excluded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-06-14 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20734:
--

 Summary: Colocate recovered edits directory with hbase.wal.dir
 Key: HBASE-20734
 URL: https://issues.apache.org/jira/browse/HBASE-20734
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu


During investigation of HBASE-20723, I realized that we wouldn't get the best 
performance when hbase.wal.dir is configured to be on different (fast) media 
than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
currently under rootdir.

Such setup may not result in fast recovery when there is region server failover.

This issue is to find proper (hopefully backward compatible) way in colocating 
recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20672) Create new HBase metrics ReadRequestRate and WriteRequestRate that reset at every monitoring interval

2018-06-11 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-20672:


> Create new HBase metrics ReadRequestRate and WriteRequestRate that reset at 
> every monitoring interval
> -
>
> Key: HBASE-20672
> URL: https://issues.apache.org/jira/browse/HBASE-20672
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Ankit Jain
>Assignee: Ankit Jain
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-20672.branch-1.001.patch, 
> HBASE-20672.master.001.patch, HBASE-20672.master.002.patch, 
> HBASE-20672.master.003.patch, hits1vs2.4.40.400.png
>
>
> Hbase currently provides counter read/write requests (ReadRequestCount, 
> WriteRequestCount). That said it is not easy to use counter that reset only 
> after a restart of the service, we would like to expose 2 new metrics in 
> HBase to provide ReadRequestRate and WriteRequestRate at region server level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20577) Make Log Level page design consistent with the design of other pages in UI

2018-06-06 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-20577.

Resolution: Fixed

Thanks for the addendum

> Make Log Level page design consistent with the design of other pages in UI
> --
>
> Key: HBASE-20577
> URL: https://issues.apache.org/jira/browse/HBASE-20577
> Project: HBase
>  Issue Type: Improvement
>  Components: UI, Usability
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20577.master.001.patch, 
> HBASE-20577.master.002.patch, HBASE-20577.master.ADDENDUM.patch, 
> after_patch_LogLevel_CLI.png, after_patch_get_log_level.png, 
> after_patch_require_field_validation.png, after_patch_set_log_level_bad.png, 
> after_patch_set_log_level_success.png, 
> before_patch_no_validation_required_field.png, rest_after_addendum_patch.png
>
>
> The Log Level page in web UI seems out of the place. I think we should make 
> it look consistent with design of other pages in HBase web UI.
> Also, validation of required fields should be done, otherwise user should not 
> be allowed to click submit button.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-06-06 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20690:
--

 Summary: Moving table to target rsgroup needs to handle 
TableStateNotFoundException
 Key: HBASE-20690
 URL: https://issues.apache.org/jira/browse/HBASE-20690
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


This is related code:
{code}
  if (targetGroup != null) {
for (TableName table: tables) {
  if (master.getAssignmentManager().isTableDisabled(table)) {
LOG.debug("Skipping move regions because the table" + table + " is 
disabled.");
continue;
  }
{code}
In a stack trace [~rmani] showed me:
{code}
2018-06-06 07:10:44,893 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
master.TableStateManager: Unable to get table demo:tbl1 state
org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
demo:tbl1
at 
org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
at 
org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
at 
org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
at 
org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
at 
org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
at 
org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
at 
org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
at 
org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
at 
org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}
The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20680) Master hung during initialization waiting on hbase:meta to be assigned which never does

2018-06-04 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20680:
--

 Summary: Master hung during initialization waiting on hbase:meta 
to be assigned which never does
 Key: HBASE-20680
 URL: https://issues.apache.org/jira/browse/HBASE-20680
 Project: HBase
  Issue Type: Bug
Reporter: Josh Elser


When running IntegrationTestRSGroups, the test became hung waiting on the 
master to be initialized.

The hbase cluster was launched without RSGroup config. The test script adds 
required RSGroup configs to hbase-site.xml and restarts the cluster.
It seems that, at one point while the master was trying to assign meta, the 
destination regionserver was in the middle of going down. This has now left 
HBase in a state where it starts the regionserver recovery procedures, but 
never actually gets hbase:meta assigned.

{code}

2018-06-01 10:47:50,024 INFO  [PEWorker-5] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=41, ppid=40, 
state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
region=1588230740}]

2018-06-01 10:47:50,026 DEBUG [WALProcedureStoreSyncThread] 
wal.WALProcedureStore: hsync completed for 
hdfs://ctr-e138-1518143905142-340983-03-14.hwx.site:8020/apps/hbase/data/   
   MasterProcWALs/pv2-0002.log

2018-06-01 10:47:50,026 INFO  [PEWorker-3] procedure.MasterProcedureScheduler: 
pid=41, ppid=40, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=hbase:meta, region=1588230740 checking lock on 1588230740

2018-06-01 10:47:50,026 DEBUG [PEWorker-3] assignment.RegionStates: setting 
location=ctr-e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190 
for rit=OFFLINE, location=ctr-  
e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190, 
table=hbase:meta, region=1588230740 last loc=null

2018-06-01 10:47:50,026 INFO  [PEWorker-3] assignment.AssignProcedure: Starting 
pid=41, ppid=40, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=hbase:meta,region=1588230740; rit=OFFLINE, 
location=ctr-e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190; 
forceNewPlan=false, retain=true target svr=null

{code}

At Fri Jun  1 10:48:04, master was restarted.

The new master picked up pid=41:

{code}

2018-06-01 10:48:47,971 INFO  [PEWorker-1] assignment.AssignProcedure: Starting 
pid=41, ppid=40, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=hbase:meta,region=1588230740; rit=OFFLINE, location=null; 
forceNewPlan=false, retain=false target svr=null
{code}

There was no further log for pid=41 after above.

Later when master initiated another meta recovery procedure (pid=42), the 
second procedure seems to be locked out by the former:

{code}
2018-06-01 10:49:34,292 INFO  [PEWorker-2] procedure.MasterProcedureScheduler: 
pid=43, ppid=42, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=hbase:meta, region=1588230740, 
target=ctr-e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190 
checking lock on 1588230740

2018-06-01 10:49:34,293 DEBUG [PEWorker-2] 
assignment.RegionTransitionProcedure: LOCK_EVENT_WAIT pid=43 serverLocks={}, 
namespaceLocks={}, tableLocks={}, 
regionLocks={{1588230740=exclusiveLockOwner=41, sharedLockCount=0, 
waitingProcCount=1}}, peerLocks={}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20677) Backport HBASE-20566 'Creating a system table after enabling rsgroup feature puts region into RIT ' to branch-2

2018-06-03 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20677:
--

 Summary: Backport HBASE-20566 'Creating a system table after 
enabling rsgroup feature puts region into RIT ' to branch-2
 Key: HBASE-20677
 URL: https://issues.apache.org/jira/browse/HBASE-20677
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


After HBASE-20566 was integrated into master, HBASE-20595 removed the concept 
of 'special tables' from rsgroups.

This task is to backport the fix to branch-2.

TestRSGroups#testRSGroupsWithHBaseQuota would be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20676) Give .hbase-snapshot proper ownership upon directory creation

2018-06-02 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20676:
--

 Summary: Give .hbase-snapshot proper ownership upon directory 
creation
 Key: HBASE-20676
 URL: https://issues.apache.org/jira/browse/HBASE-20676
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


This is continuation of the discussion over HBASE-20668.

Tthe .hbase-snapshot directory is not created at cluster startup. Normally it 
is created when snapshot operation is initiated.

However, if before any snapshot operation is performed, some non-super user 
from another cluster conducts ExportSnapshot to this cluster, the 
.hbase-snapshot directory would be created as that user.
(This is just one scenario that can lead to wrong ownership)

This JIRA is to seek proper way(s) to ensure that .hbase-snapshot directory 
would always carry proper onwership and permission upon creation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20668) Exception from FileSystem operation in finally block of ExportSnapshot#doWork may hide exception from FileUtil.copy call

2018-06-01 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20668:
--

 Summary: Exception from FileSystem operation in finally block of 
ExportSnapshot#doWork may hide exception from FileUtil.copy call
 Key: HBASE-20668
 URL: https://issues.apache.org/jira/browse/HBASE-20668
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


I was debugging the following error [~romil.choksi] saw during testing 
ExportSnapshot :
{code}
2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 - 
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|2018-06-01 02:40:52,358 ERROR 
[main] util.AbstractHBaseTool: Error  running command-line tool
2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 - 
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|java.io.FileNotFoundException: 
Directory/File does not exist /apps/ 
hbase/data/.hbase-snapshot/.tmp/snapshot_table_334546
2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 - 
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.  
checkOwner(FSDirectory.java:1777)
2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 - 
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at 
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.  
setOwner(FSDirAttrOp.java:82)
{code}
Here is corresponding code (with extra log added):
{code}
try {
  LOG.info("Copy Snapshot Manifest from " + snapshotDir + " to " + 
initialOutputSnapshotDir);
  boolean ret = FileUtil.copy(inputFs, snapshotDir, outputFs, 
initialOutputSnapshotDir, false,
  false, conf);
  LOG.info("return val = " + ret);
} catch (IOException e) {
  LOG.warn("Failed to copy the snapshot directory: from=" +
  snapshotDir + " to=" + initialOutputSnapshotDir, e);
  throw new ExportSnapshotException("Failed to copy the snapshot directory: 
from=" +
snapshotDir + " to=" + initialOutputSnapshotDir, e);
} finally {
  if (filesUser != null || filesGroup != null) {
LOG.warn((filesUser == null ? "" : "Change the owner of " + 
needSetOwnerDir + " to "
+ filesUser)
+ (filesGroup == null ? "" : ", Change the group of " + 
needSetOwnerDir + " to "
+ filesGroup));
setOwner(outputFs, needSetOwnerDir, filesUser, filesGroup, true);
  }
{code}
"return val = " was not seen in rerun of the test.
This is what the additional log revealed:
{code}
2018-06-01 09:22:54,247|INFO|MainThread|machine.py:167 - 
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|2018-06-01 09:22:54,241 WARN  
[main] snapshot.ExportSnapshot: Failed to copy the snapshot directory: 
from=hdfs://ns1/apps/hbase/data/.hbase-snapshot/snapshot_table_157842 
to=hdfs://ns3/apps/hbase/data/.hbase-snapshot/.tmp/snapshot_table_157842
2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 - 
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|org.apache.hadoop.security.AccessControlException:
 Permission denied:   user=hbase, access=WRITE, 
inode="/apps/hbase/data/.hbase-snapshot/.tmp":hrt_qa:hadoop:drx-wT
2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 - 
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.  
check(FSPermissionChecker.java:399)
2018-06-01 09:22:54,249|INFO|MainThread|machine.py:167 - 
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.  
checkPermission(FSPermissionChecker.java:255)
{code}
It turned out that the exception from {{setOwner}} call in the finally block 
eclipsed the real exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20639) Implement permission checking through AccessController instead of RSGroupAdminEndpoint

2018-05-29 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-20639:


> Implement permission checking through AccessController instead of 
> RSGroupAdminEndpoint
> --
>
> Key: HBASE-20639
> URL: https://issues.apache.org/jira/browse/HBASE-20639
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Nihal Jain
>Priority: Major
> Attachments: HBASE-20639.master.001.patch, 
> HBASE-20639.master.002.patch, HBASE-20639.master.002.patch
>
>
> Currently permission checking for various RS group operations is done via 
> RSGroupAdminEndpoint.
> e.g. in RSGroupAdminServiceImpl#moveServers() :
> {code}
> checkPermission("moveServers");
> groupAdminServer.moveServers(hostPorts, request.getTargetGroup());
> {code}
> The practice in remaining parts of hbase is to perform permission checking 
> within AccessController.
> Now that observer hooks for RS group operations are in right place, we should 
> follow best practice and move permission checking to AccessController.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20654) Expose regions in transition thru JMX

2018-05-27 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20654:
--

 Summary: Expose regions in transition thru JMX
 Key: HBASE-20654
 URL: https://issues.apache.org/jira/browse/HBASE-20654
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu


Currently only the count of regions in transition is exposed thru JMX.
Here is a sample snippet of the /jmx output:
{code}
{
  "beans" : [ {
...
  }, {
"name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManager",
"modelerType" : "Master,sub=AssignmentManager",
"tag.Context" : "master",
...
"ritCount" : 3
{code}
It would be desirable to expose region name, state for the regions in 
transition as well.
We can place configurable upper bound on the number of entries returned in case 
there're a lot of regions in transition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20653) Add missing observer hooks for region server group to MasterObserver

2018-05-27 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20653:
--

 Summary: Add missing observer hooks for region server group to 
MasterObserver
 Key: HBASE-20653
 URL: https://issues.apache.org/jira/browse/HBASE-20653
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


Currently the following region server group operations don't have corresponding 
hook in MasterObserver :

* getRSGroupInfo
* getRSGroupInfoOfServer
* getRSGroupInfoOfTable
* listRSGroup

This JIRA is to 

* add them to MasterObserver
* add corresponding permission check in AccessController
* move the {{checkPermission}} out of RSGroupAdminEndpoint
* add corresponding tests to TestRSGroupsWithACL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20079) Report all the new test classes missing HBaseClassTestRule in one patch

2018-05-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-20079.

Resolution: Later

> Report all the new test classes missing HBaseClassTestRule in one patch
> ---
>
> Key: HBASE-20079
> URL: https://issues.apache.org/jira/browse/HBASE-20079
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Trivial
>
> Currently if there are both new small and large tests without 
> HBaseClassTestRule in a single patch, the QA bot would report the small test 
> class as missing HBaseClassTestRule but not the large test.
> All new test classes missing HBaseClassTestRule should be reported in the 
> same QA run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-05-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-20081.

Resolution: Cannot Reproduce

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20644) Master shutdown due to service ClusterSchemaServiceImpl failing to start

2018-05-24 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20644:
--

 Summary: Master shutdown due to service ClusterSchemaServiceImpl 
failing to start
 Key: HBASE-20644
 URL: https://issues.apache.org/jira/browse/HBASE-20644
 Project: HBase
  Issue Type: Bug
Reporter: Romil Choksi


>From hbase-hbase-master-ctr-e138-1518143905142-329221-01-03.hwx.site.log :
{code}
2018-05-23 22:14:29,750 ERROR 
[master/ctr-e138-1518143905142-329221-01-03:2] master.HMaster: Failed 
to become active master
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1054)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:918)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2023)
{code}
Earlier in the log , the namespace region was deemed OPEN on 
01-07.hwx.site,16020,1527112194788 which was declared not online:
{code}
2018-05-23 21:54:34,786 INFO  
[master/ctr-e138-1518143905142-329221-01-03:2] 
assignment.RegionStateStore: Load hbase:meta entry  
   region=01a7f9ba9fffd691f261d3fbc620da06, regionState=OPEN, 
lastHost=ctr-e138-1518143905142-329221-01-07.hwx.site,16020,1527112194788, 
regionLocation=ctr-e138-1518143905142-329221-01-07.hwx.site,16020,1527112194788,
 seqnum=43
2018-05-23 21:54:34,787 INFO  
[master/ctr-e138-1518143905142-329221-01-03:2] 
assignment.AssignmentManager: Number of RegionServers=1
2018-05-23 21:54:34,788 INFO  
[master/ctr-e138-1518143905142-329221-01-03:2] 
assignment.AssignmentManager: KILL 
RegionServer=ctr-e138-1518143905142-329221-01-07.   
hwx.site,16020,1527112194788 hosting regions but not online.
{code}
Later, even though a different instance on 007 registered with master:
{code}
2018-05-23 21:55:13,541 INFO  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
master.ServerManager: Registering 
regionserver=ctr-e138-1518143905142-329221-01-07.hwx.site,16020,1527112506002
...
2018-05-23 21:55:43,881 INFO  
[master/ctr-e138-1518143905142-329221-01-03:2] 
client.RpcRetryingCallerImpl: Call exception, tries=12, retries=12, 
started=69001 ms ago,cancelled=false, 
msg=org.apache.hadoop.hbase.NotServingRegionException: 
hbase:namespace,,1527099443383.01a7f9ba9fffd691f261d3fbc620da06. is not online 
on ctr-e138-1518143905142-329221-  01-07.hwx.site,16020,1527112506002
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3273)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3250)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2446)
  at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
{code}
There was no OPEN request sent to that instance.

>From 
>hbase-hbase-regionserver-ctr-e138-1518143905142-329221-01-07.hwx.site.log :
{code}
2018-05-23 21:52:27,414 INFO  
[RS_CLOSE_REGION-regionserver/ctr-e138-1518143905142-329221-01-07:16020-1] 
regionserver.HRegion: Closed hbase:namespace,,1527099443383.   
01a7f9ba9fffd691f261d3fbc620da06.
{code}
Then region server 007 restarted:
{code}
Wed May 23 21:55:03 UTC 2018 Starting regionserver on 
ctr-e138-1518143905142-329221-01-07.hwx.site
{code}
After which the region 01a7f9ba9fffd691f261d3fbc620da06 never showed up again 
in log 007



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20639) Implement permission checking through AccessController instead of RSGroupAdminEndpoint

2018-05-24 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20639:
--

 Summary: Implement permission checking through AccessController 
instead of RSGroupAdminEndpoint
 Key: HBASE-20639
 URL: https://issues.apache.org/jira/browse/HBASE-20639
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


Currently permission checking for various RS group operations is done via 
RSGroupAdminEndpoint.
e.g. in RSGroupAdminServiceImpl#moveServers() :
{code}
checkPermission("moveServers");
groupAdminServer.moveServers(hostPorts, request.getTargetGroup());
{code}
The practice in remaining parts of hbase is to perform permission checking 
within AccessController.

Now that observer hooks for RS group operations are in right place, we should 
follow best practice and move permission checking to AccessController.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20627) Relocate RS Group pre/post hooks from RSGroupAdminServer to RSGroupAdminEndpoint

2018-05-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-20627:


> Relocate RS Group pre/post hooks from RSGroupAdminServer to 
> RSGroupAdminEndpoint
> 
>
> Key: HBASE-20627
> URL: https://issues.apache.org/jira/browse/HBASE-20627
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 2.1.0
>
> Attachments: 20627.branch-1.txt, 20627.v1.txt, 20627.v2.txt, 
> 20627.v3.txt
>
>
> Currently RS Group pre/post hooks are called from RSGroupAdminServer.
> e.g. RSGroupAdminServer#removeRSGroup :
> {code}
>   if (master.getMasterCoprocessorHost() != null) {
> master.getMasterCoprocessorHost().preRemoveRSGroup(name);
>   }
> {code}
> RSGroupAdminServer#removeRSGroup is called by RSGroupAdminEndpoint :
> {code}
> checkPermission("removeRSGroup");
> groupAdminServer.removeRSGroup(request.getRSGroupName());
> {code}
> If permission check fails, the pre hook wouldn't be called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20627) Relocate RS Group pre/post hooks from RSGroupAdminServer to RSGroupAdminEndpoint

2018-05-23 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20627:
--

 Summary: Relocate RS Group pre/post hooks from RSGroupAdminServer 
to RSGroupAdminEndpoint
 Key: HBASE-20627
 URL: https://issues.apache.org/jira/browse/HBASE-20627
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 20627.v1.txt

Currently RS Group pre/post hooks are called from RSGroupAdminServer.
e.g. RSGroupAdminServer#removeRSGroup :
{code}
  if (master.getMasterCoprocessorHost() != null) {
master.getMasterCoprocessorHost().preRemoveRSGroup(name);
  }
{code}
RSGroupAdminServer#removeRSGroup is called by RSGroupAdminEndpoint :
{code}
checkPermission("removeRSGroup");
groupAdminServer.removeRSGroup(request.getRSGroupName());
{code}
If permission check fails, the pre hook wouldn't be called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20609) SnapshotHFileCleaner#init should check that params is not null

2018-05-21 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20609:
--

 Summary: SnapshotHFileCleaner#init should check that params is not 
null
 Key: HBASE-20609
 URL: https://issues.apache.org/jira/browse/HBASE-20609
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


Noticed the following in the test output of TestHFileArchiving :
{code}
SnapshotHFileCleaner.init(Map) line: 79
HFileCleaner(CleanerChore).newFileCleaner(String, Configuration) line: 260
HFileCleaner(CleanerChore).initCleanerChain(String) line: 232
HFileCleaner(CleanerChore).(String, int, Stoppable, Configuration, 
FileSystem, Path, String, Map) line: 182
HFileCleaner.(int, Stoppable, Configuration, FileSystem, Path, 
Map) line: 104
HFileCleaner.(int, Stoppable, Configuration, FileSystem, Path) line: 51
TestHFileArchiving.testCleaningRace() line: 377
{code}
This was due to SnapshotHFileCleaner#init not checking the parameter {{params}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20578) Support region server group in target cluster

2018-05-13 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20578:
--

 Summary: Support region server group in target cluster
 Key: HBASE-20578
 URL: https://issues.apache.org/jira/browse/HBASE-20578
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu


When source tables belong to non-default region server group(s) and there are 
region server group counterpart in the target cluster, we should support 
restoring to target cluster using the region server group mapping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException

2018-05-09 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20552:
--

 Summary: HBase RegionServer was shutdown due to 
UnexpectedStateException
 Key: HBASE-20552
 URL: https://issues.apache.org/jira/browse/HBASE-20552
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Romil Choksi


This was observed during cluster testing (source code sync'ed with hbase-2.0, 
built May 2nd):
{code}
2018-05-02 05:44:10,089 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] 
master.MasterRpcServices: Region server 
ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported a 
fatal error:
* ABORTING region server 
ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: 
org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138-   
  1518143905142-279227-01-07.hwx.site,16020,1525239609353, 
table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-
  1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has 
otherwise.
  at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
  at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
  at 
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
  at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
rit=OPEN, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,  
   table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 but 
state  has otherwise.
  at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
  ... 7 more
 *
Cause:
org.apache.hadoop.hbase.YouAreDeadException: 
org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,  
 table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 but 
state  has otherwise.
  at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065)
  at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987)
  at 
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459)
  at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: 
rit=OPEN, 
location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353,  
   table=test_hbase_ha_load_test_tool_hbase, 
region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on 
server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 but 
state  has otherwise.
  at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037)
  ... 7 more

  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
{code}
[~elserj] and I did some initial analysis.

In the following description, M1 refers to 
master-ctr-e138-1518143905142-279227-01-05 and M2 refers to 
master-ctr-e138-1518143905142-279227-01-03.

Let's follow region 94f6ca283dbb4445b2bcdc321b734d28 .

Master 1 was moving the region to 07:
{code}
2018-05-02 05:38:59,017 INFO  
[master/ctr-e138-1518143905142-279227-01-05:2.Chore.1] master.HMaster: 
balance hri=94f6ca283dbb4445b2bcdc321b734d28, 

[jira] [Created] (HBASE-20530) Composition of backup directory incorrectly contains namespace when restoring

2018-05-03 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20530:
--

 Summary: Composition of backup directory incorrectly contains 
namespace when restoring
 Key: HBASE-20530
 URL: https://issues.apache.org/jira/browse/HBASE-20530
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


Here is partial listing of output from incremental backup:
{code}
5306 2018-05-04 02:38 
hdfs://mycluster/user/hbase/backup_loc/backup_1525401467793/table_almphxih4u/cf1/5648501da7194783947bbf07b172f07e
{code}
When restoring, here is what HBackupFileSystem.getTableBackupDir returns:
{code}
fileBackupDir=hdfs://mycluster/user/hbase/backup_loc/backup_1525401467793/default/table_almphxih4u
{code}
You can see that namespace gets in the way, leading to inability of finding the 
proper hfile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20508) TestIncrementalBackupWithBulkLoad doesn't need to be Parameterized test

2018-04-29 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20508:
--

 Summary: TestIncrementalBackupWithBulkLoad doesn't need to be 
Parameterized test
 Key: HBASE-20508
 URL: https://issues.apache.org/jira/browse/HBASE-20508
 Project: HBase
  Issue Type: Test
  Components: backuprestore
Reporter: Ted Yu


TestIncrementalBackupWithBulkLoad currently is Parameterized with only one 
value returned from data() method.
In its ctor, this value is ignored.

TestIncrementalBackupWithBulkLoad doesn't need to be Parameterized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20495) REST unit test fails with NoClassDefFoundError against hadoop3

2018-04-26 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20495:
--

 Summary: REST unit test fails with NoClassDefFoundError against 
hadoop3
 Key: HBASE-20495
 URL: https://issues.apache.org/jira/browse/HBASE-20495
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu


The following was first observed in the test output of rest.TestDeleteRow 
against hadoop3:
{code}
java.lang.NoClassDefFoundError: 
com/sun/jersey/core/spi/factory/AbstractRuntimeDelegate
Caused by: java.lang.ClassNotFoundException: 
com.sun.jersey.core.spi.factory.AbstractRuntimeDelegate
{code}
This was due to the following transitive dependency on jersey 1.19:
{code}
[INFO] +- org.apache.hbase:hbase-testing-util:jar:2.0.0.3.0.0.0-SNAPSHOT:test
[INFO] |  +- 
org.apache.hbase:hbase-zookeeper:test-jar:tests:2.0.0.3.0.0.0-SNAPSHOT:test
[INFO] |  +- 
org.apache.hbase:hbase-hadoop-compat:test-jar:tests:2.0.0.3.0.0.0-SNAPSHOT:test
[INFO] |  +- 
org.apache.hbase:hbase-hadoop2-compat:test-jar:tests:2.0.0.3.0.0.0-SNAPSHOT:test
[INFO] |  +- 
org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:3.0.0:compile
[INFO] |  |  \- 
org.apache.hadoop:hadoop-mapreduce-client-common:jar:3.0.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-hdfs:test-jar:tests:3.0.0:test
[INFO] |  |  \- com.sun.jersey:jersey-server:jar:1.19:compile
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20473) Ineffective INFO logging adjustment in HFilePerformanceEvaluation

2018-04-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-20473.

Resolution: Not A Problem

> Ineffective INFO logging adjustment in HFilePerformanceEvaluation
> -
>
> Key: HBASE-20473
> URL: https://issues.apache.org/jira/browse/HBASE-20473
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   // Disable verbose INFO logging from org.apache.hadoop.io.compress.CodecPool
>   static {
> System.setProperty("org.apache.commons.logging.Log",
>   "org.apache.commons.logging.impl.SimpleLog");
> {code}
> The above code has no effect since we're migrating away from commons-logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20473) Ineffective INFO logging adjustment in HFilePerformanceEvaluation

2018-04-22 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20473:
--

 Summary: Ineffective INFO logging adjustment in 
HFilePerformanceEvaluation
 Key: HBASE-20473
 URL: https://issues.apache.org/jira/browse/HBASE-20473
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


{code}
  // Disable verbose INFO logging from org.apache.hadoop.io.compress.CodecPool
  static {
System.setProperty("org.apache.commons.logging.Log",
  "org.apache.commons.logging.impl.SimpleLog");
{code}
The above code has no effect since we're migrating away from commons-logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20436) IntegrationTestSparkBulkLoad cannot access abstract processOptions of AbstractHBaseTool

2018-04-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-20436.

Resolution: Not A Problem

> IntegrationTestSparkBulkLoad cannot access abstract processOptions of 
> AbstractHBaseTool
> ---
>
> Key: HBASE-20436
> URL: https://issues.apache.org/jira/browse/HBASE-20436
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Yu
>Priority: Major
>
> Saw the following compilation error in hbase-spark-it module:
> {code}
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /hbase/hbase-spark-it/src/test/java/org/apache/hadoop/hbase/spark/IntegrationTestSparkBulkLoad.java:[638,10]
>  abstract method 
> processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine)
>  in org.apache.hadoop.hbase.util.AbstractHBaseTool cannot be accessed directly
> {code}
> The processOptions method of AbstractHBaseTool is abstract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20436) IntegrationTestSparkBulkLoad cannot access abstract processOptions of AbstractHBaseTool

2018-04-17 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20436:
--

 Summary: IntegrationTestSparkBulkLoad cannot access abstract 
processOptions of AbstractHBaseTool
 Key: HBASE-20436
 URL: https://issues.apache.org/jira/browse/HBASE-20436
 Project: HBase
  Issue Type: Bug
  Components: spark
Reporter: Ted Yu


Saw the following compilation error in hbase-spark-it module:
{code}
[ERROR] COMPILATION ERROR :
[INFO] -
[ERROR] 
/hbase/hbase-spark-it/src/test/java/org/apache/hadoop/hbase/spark/IntegrationTestSparkBulkLoad.java:[638,10]
 abstract method 
processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine) 
in org.apache.hadoop.hbase.util.AbstractHBaseTool cannot be accessed directly
{code}
The processOptions method of AbstractHBaseTool is abstract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20414) TestLockProcedure#testMultipleLocks may fail on slow machine

2018-04-13 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20414:
--

 Summary: TestLockProcedure#testMultipleLocks may fail on slow 
machine
 Key: HBASE-20414
 URL: https://issues.apache.org/jira/browse/HBASE-20414
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


Here was recent failure : 
https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/172/testReport/junit/org.apache.hadoop.hbase.master.locking/TestLockProcedure/health_checks___yetus_jdk8_hadoop2_checks___testMultipleLocks/
{code}
java.lang.AssertionError: expected: but was:
at 
org.apache.hadoop.hbase.master.locking.TestLockProcedure.sendHeartbeatAndCheckLocked(TestLockProcedure.java:221)
at 
org.apache.hadoop.hbase.master.locking.TestLockProcedure.testMultipleLocks(TestLockProcedure.java:311)
{code}
In the test output, we can see this:
{code}
2018-04-13 20:19:54,230 DEBUG [Time-limited test] 
locking.TestLockProcedure(225): Proc id 22 : LOCKED.
...
2018-04-13 20:19:55,529 DEBUG [Time-limited test] 
procedure2.ProcedureExecutor(865): Stored pid=26, state=RUNNABLE; 
org.apache.hadoop.hbase.master.locking.LockProcedure 
regions=a7f9adfd047350eabb199a39564ba4db,c1e297609590b707233a2f9c8bb51fa1, 
type=EXCLUSIVE

2018-04-13 20:19:56,230 DEBUG [ProcExecTimeout] locking.LockProcedure(207): 
Timeout failure ProcedureEvent for pid=22, state=WAITING_TIMEOUT; 
org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, 
type=EXCLUSIVE, ready=false, [pid=22, state=WAITING_TIMEOUT; 
org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, 
type=EXCLUSIVE]
{code}
After the pid=26 log, the code does this (1 second wait):
{code}
// Assert tables & region locks are waiting because of namespace lock.
Thread.sleep(HEARTBEAT_TIMEOUT / 2);
{code}

On a slow machine (in the case above), there was only 730 msec between the 
queueing of regionsLock2 and the WAITING_TIMEOUT state of the nsLock. The 1 
second wait was too long, leading to assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20375) Remove use of getCurrentUserCredentials in hbase-spark module

2018-04-09 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20375:
--

 Summary: Remove use of getCurrentUserCredentials in hbase-spark 
module
 Key: HBASE-20375
 URL: https://issues.apache.org/jira/browse/HBASE-20375
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


When compiling hbase-spark module against Spark 2.3.0 release, we would get:
{code}
[ERROR] 
/a/hbase/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:68:
 error: value getCurrentUserCredentials is not a member of 
org.apache.spark.deploy.SparkHadoopUtil
[ERROR]   @transient var credentials = 
SparkHadoopUtil.get.getCurrentUserCredentials()
[ERROR]^
[ERROR] 
/a/hbase/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:236:
 error: value getCurrentUserCredentials is not a member of 
org.apache.spark.deploy.   SparkHadoopUtil
[ERROR] credentials = SparkHadoopUtil.get.getCurrentUserCredentials()
[ERROR]   ^
[ERROR] two errors found
{code}
{{getCurrentUserCredentials}} was removed by SPARK-22372.

This issue is to replace the call to {{getCurrentUserCredentials}} with call to 
{{UserGroupInformation.getCurrentUser().getCredentials()}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20325) ReassignPartitionsClusterTest#shouldMoveSubsetOfPartitions is flaky

2018-04-01 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20325:
--

 Summary: 
ReassignPartitionsClusterTest#shouldMoveSubsetOfPartitions is flaky
 Key: HBASE-20325
 URL: https://issues.apache.org/jira/browse/HBASE-20325
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


Saw this from 
https://builds.apache.org/job/kafka-trunk-jdk8/2518/testReport/junit/kafka.admin/ReassignPartitionsClusterTest/shouldMoveSubsetOfPartitions/
 :
{code}
kafka.common.AdminCommandFailedException: Partition reassignment currently in 
progress for Map(topic1-0 -> Buffer(100, 102), topic1-2 -> Buffer(100, 102), 
topic2-1 -> Buffer(101, 100), topic2-2 -> Buffer(100, 102)). Aborting operation
at 
kafka.admin.ReassignPartitionsCommand.reassignPartitions(ReassignPartitionsCommand.scala:612)
at 
kafka.admin.ReassignPartitionsCommand$.executeAssignment(ReassignPartitionsCommand.scala:215)
at 
kafka.admin.ReassignPartitionsClusterTest.shouldMoveSubsetOfPartitions(ReassignPartitionsClusterTest.scala:242)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20159) Support using separate ZK quorums for client

2018-03-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-20159:


> Support using separate ZK quorums for client
> 
>
> Key: HBASE-20159
> URL: https://issues.apache.org/jira/browse/HBASE-20159
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, Operability, Zookeeper
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: 20159.addendum, 20159.addendum2.patch, 
> HBASE-20159.patch, HBASE-20159.v2.patch, HBASE-20159.v3.patch
>
>
> Currently we are using the same zookeeper quorums for client and server, 
> which makes us under risk that if some client connection boost exhausted 
> zookeeper, RegionServer might abort due to zookeeper session loss. Actually 
> we have suffered from this many times in production.
> Here we propose to allow client to use different ZK quorums, through below 
> settings:
> {noformat}
> hbase.client.zookeeper.quorum
> hbase.client.zookeeper.property.clientPort
> hbase.client.zookeeper.observer.mode
> {noformat}
> The first two are for specifying client zookeeper properties, and the third 
> one indicating whether the client ZK nodes are in observer mode. If the 
> client ZK are not observer nodes, HMaster will take responsibility to 
> synchronize necessary meta information (such as meta location and master 
> address, etc.) from server to client ZK



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20123) Backup test fails against hadoop 3

2018-03-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-20123.

Resolution: Duplicate

Should be fixed by HADOOP-15289

> Backup test fails against hadoop 3
> --
>
> Key: HBASE-20123
> URL: https://issues.apache.org/jira/browse/HBASE-20123
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> When running backup unit test against hadoop3, I saw:
> {code}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 88.862 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes
> [ERROR] 
> testBackupMultipleDeletes(org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes)
>   Time elapsed: 86.206 s  <<< ERROR!
> java.io.IOException: java.io.IOException: Failed copy from 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to 
> hdfs://localhost:40578/backupUT
>   at 
> org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82)
> Caused by: java.io.IOException: Failed copy from 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to 
> hdfs://localhost:40578/backupUT
>   at 
> org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82)
> {code}
> In the test output, I found:
> {code}
> 2018-03-03 14:46:10,858 ERROR [Time-limited test] 
> mapreduce.MapReduceBackupCopyJob$BackupDistCp(237): java.io.IOException: Path 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic 
> link
> java.io.IOException: Path 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic 
> link
>   at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:338)
>   at org.apache.hadoop.fs.FileStatus.readFields(FileStatus.java:461)
>   at 
> org.apache.hadoop.tools.CopyListingFileStatus.readFields(CopyListingFileStatus.java:155)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2308)
>   at 
> org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:163)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
>   at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.createInputFileListing(MapReduceBackupCopyJob.java:297)
>   at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
>   at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.execute(MapReduceBackupCopyJob.java:196)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
>   at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:408)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:348)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:290)
>   at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605)
> {code}
> It seems the failure was related to how we use distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20272) TestAsyncTable#testCheckAndMutateWithTimeRange fails due to TableExistsException

2018-03-23 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20272:
--

 Summary: TestAsyncTable#testCheckAndMutateWithTimeRange fails due 
to TableExistsException
 Key: HBASE-20272
 URL: https://issues.apache.org/jira/browse/HBASE-20272
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu


The following test failure is reproducible:
{code}
org.apache.hadoop.hbase.TableExistsException: testCheckAndMutateWithTimeRange
 at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.prepareCreate(CreateTableProcedure.java:233)
 at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:87)
 at 
org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:51)
 at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
 at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1453)
{code}
The cause was that TestAsyncTable is parameterized while the 
testCheckAndMutateWithTimeRange uses the same table name without dropping the 
table after the first invocation finishes.
This leads to second invocation failing with TableExistsException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20257) hbase-spark should not depend on com.google.code.findbugs.jsr305

2018-03-22 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20257:
--

 Summary: hbase-spark should not depend on 
com.google.code.findbugs.jsr305
 Key: HBASE-20257
 URL: https://issues.apache.org/jira/browse/HBASE-20257
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


The following can be observed in the build output of master branch:
{code}
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.BannedDependencies failed 
with message:
We don't allow the JSR305 jar from the Findbugs project, see HBASE-16321.
Found Banned Dependency: com.google.code.findbugs:jsr305:jar:1.3.9
Use 'mvn dependency:tree' to locate the source of the banned dependencies.
{code}
Here is related snippet from hbase-spark/pom.xml:
{code}

  com.google.code.findbugs
  jsr305
{code}
Dependency on jsr305 should be dropped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20244) NoSuchMethodException when retrieving private method decryptEncryptedDataEncryptionKey from DFSClient

2018-03-21 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20244:
--

 Summary: NoSuchMethodException when retrieving private method 
decryptEncryptedDataEncryptionKey from DFSClient
 Key: HBASE-20244
 URL: https://issues.apache.org/jira/browse/HBASE-20244
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


I was running unit test against hadoop 3.0.1 RC and saw the following in test 
output:
{code}
ERROR [RS-EventLoopGroup-3-3] 
asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper(267): Couldn't properly 
initialize access to HDFS internals. Please update  your WAL Provider to not 
make use of the 'asyncfs' provider. See HBASE-16110 for more information.
java.lang.NoSuchMethodException: 
org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(org.apache.hadoop.fs.FileEncryptionInfo)
  at java.lang.Class.getDeclaredMethod(Class.java:2130)
  at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.createTransparentCryptoHelper(FanOutOneBlockAsyncDFSOutputSaslHelper.java:232)
  at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.(FanOutOneBlockAsyncDFSOutputSaslHelper.java:262)
  at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.initialize(FanOutOneBlockAsyncDFSOutputHelper.java:661)
  at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:118)
  at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:720)
  at 
org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:715)
  at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
  at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
  at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
  at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
  at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
  at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
  at 
org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:306)
  at 
org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:341)
  at 
org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
  at 
org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
{code}
The private method was moved by HDFS-12574 to HdfsKMSUtil with different 
signature.

To accommodate the above method movement, it seems we need to call the 
following method of DFSClient :
{code}
  public KeyProvider getKeyProvider() throws IOException {
{code}
Since the new decryptEncryptedDataEncryptionKey method has this signature:
{code}
  static KeyVersion decryptEncryptedDataEncryptionKey(FileEncryptionInfo
feInfo, KeyProvider keyProvider) throws IOException {
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20214) Review of RegionLocationFinder Class

2018-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-20214:


> Review of RegionLocationFinder Class
> 
>
> Key: HBASE-20214
> URL: https://issues.apache.org/jira/browse/HBASE-20214
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer, master
>Affects Versions: 2.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-20214.1.patch
>
>
> # Use SLF4J parameter logging
>  # Remove superfluous code
>  # Replace code with re-usable libraries where possible
>  # Use different data structure
>  # Small perf improvements
>  # Fix some checkstyle



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20196) Maintain all regions with same size in memstore flusher

2018-03-14 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20196:
--

 Summary: Maintain all regions with same size in memstore flusher
 Key: HBASE-20196
 URL: https://issues.apache.org/jira/browse/HBASE-20196
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu


Here is the javadoc for getCopyOfOnlineRegionsSortedByOffHeapSize() :
{code}
   *   the biggest.  If two regions are the same size, then the last one found 
wins; i.e. this
   *   method may NOT return all regions.
{code}
Currently value type is HRegion - we only store one region per size.
I think we should change value type to Collection so that we don't 
miss any region (potentially with big size).

e.g. Suppose there are there regions (R1, R2 and R3) with sizes 100, 100 and 1, 
respectively.
Using the current data structure, R2 would be stored in the Map, evicting R1 
from the Map.
This means that the current code would choose to flush regions R2 and R3, 
releasing 101 from memory.
If value type is changed to Collection, we would flush both R1 and R2. 
This achieves faster memory reclamation.

Confirmed with [~eshcar] over in HBASE-20090



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20104) Fix infinite loop of RIT when creating table on a rsgroup that has no online servers

2018-03-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-20104.

   Resolution: Fixed
Fix Version/s: (was: 1.4.3)

Reverted from branch-1 and branch-1.4

Xiaolin:
If you want to backport the patch, please open another JIRA.
This was marked fixed for beta2 which has shipped.

> Fix infinite loop of RIT when creating table on a rsgroup that has no online 
> servers
> 
>
> Key: HBASE-20104
> URL: https://issues.apache.org/jira/browse/HBASE-20104
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.0.0-beta-2
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-20104.branch-1.001.patch, 
> HBASE-20104.branch-1.4.001.patch, HBASE-20104.branch-2.001.patch, 
> HBASE-20104.branch-2.002.patch
>
>
> This error has been reported in 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11635/testReport/org.apache.hadoop.hbase.rsgroup/TestRSGroups/org_apache_hadoop_hbase_rsgroup_TestRSGroups/
> Cases that creating tables on a rsgroup which has been stopped or 
> decommissioned all region servers can reproduce this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20104) Fix infinite loop of RIT when creating table on a rsgroup that has no online servers

2018-03-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-20104:


> Fix infinite loop of RIT when creating table on a rsgroup that has no online 
> servers
> 
>
> Key: HBASE-20104
> URL: https://issues.apache.org/jira/browse/HBASE-20104
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.0.0-beta-2
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.0.0-beta-2, 1.4.3
>
> Attachments: HBASE-20104.branch-1.001.patch, 
> HBASE-20104.branch-1.4.001.patch, HBASE-20104.branch-2.001.patch, 
> HBASE-20104.branch-2.002.patch
>
>
> This error has been reported in 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11635/testReport/org.apache.hadoop.hbase.rsgroup/TestRSGroups/org_apache_hadoop_hbase_rsgroup_TestRSGroups/
> Cases that creating tables on a rsgroup which has been stopped or 
> decommissioned all region servers can reproduce this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20176) Fix warnings about Logging import in hbase-spark test code

2018-03-12 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20176:
--

 Summary: Fix warnings about Logging import in hbase-spark test code
 Key: HBASE-20176
 URL: https://issues.apache.org/jira/browse/HBASE-20176
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


This is follow-on to HBASE-16179.

In HBASE-16179 we fixed warning in non-test code in the following form:
{code}
warning: imported `Logging' is permanently hidden by definition of trait  
Logging in package spark
{code}
However, there are a few warnings not detected by precommit bot:
{code}
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseConnectionCacheSuite.scala:25:
 warning: imported `Logging' is permanently hidden by definition oftrait 
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala:23:
 warning: imported `Logging' is permanently hidden by definition of object 
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala:23:
 warning: imported `Logging' is permanently hidden by definition of trait  
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctionsSuite.scala:20:
 warning: imported `Logging' is permanently hidden by definition of   object 
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctionsSuite.scala:20:
 warning: imported `Logging' is permanently hidden by definition of   trait 
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctionsSuite.scala:20:
 warning: imported `Logging' is permanently hidden by definition of   
object Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctionsSuite.scala:20:
 warning: imported `Logging' is permanently hidden by definition of trait 
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/PartitionFilterSuite.scala:21:
 warning: imported `Logging' is permanently hidden by definition of object  
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
[WARNING]  ^
[WARNING] 
/a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/PartitionFilterSuite.scala:21:
 warning: imported `Logging' is permanently hidden by definition of trait   
Logging in package spark
[WARNING] import org.apache.hadoop.hbase.spark.Logging
{code}
This issue is to fix the above warnings in test code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20136) TestKeyValue misses ClassRule and Category annotations

2018-03-05 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20136:
--

 Summary: TestKeyValue misses ClassRule and Category annotations
 Key: HBASE-20136
 URL: https://issues.apache.org/jira/browse/HBASE-20136
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu


hbase-common/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java misses 
ClassRule and Category annotations.

This issue adds the annotations to this test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20123) Backup test fails against hadoop 3

2018-03-03 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20123:
--

 Summary: Backup test fails against hadoop 3
 Key: HBASE-20123
 URL: https://issues.apache.org/jira/browse/HBASE-20123
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


When running backup unit test against hadoop3, I saw:
{code}
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 88.862 
s <<< FAILURE! - in org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes
[ERROR] 
testBackupMultipleDeletes(org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes)
  Time elapsed: 86.206 s  <<< ERROR!
java.io.IOException: java.io.IOException: Failed copy from 
hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to 
hdfs://localhost:40578/backupUT
  at 
org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82)
Caused by: java.io.IOException: Failed copy from 
hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to 
hdfs://localhost:40578/backupUT
  at 
org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82)
{code}
In the test output, I found:
{code}
2018-03-03 14:46:10,858 ERROR [Time-limited test] 
mapreduce.MapReduceBackupCopyJob$BackupDistCp(237): java.io.IOException: Path 
hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic link
java.io.IOException: Path 
hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic link
  at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:338)
  at org.apache.hadoop.fs.FileStatus.readFields(FileStatus.java:461)
  at 
org.apache.hadoop.tools.CopyListingFileStatus.readFields(CopyListingFileStatus.java:155)
  at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2308)
  at 
org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:163)
  at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91)
  at 
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
  at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
  at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
  at 
org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.createInputFileListing(MapReduceBackupCopyJob.java:297)
  at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
  at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
  at 
org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.execute(MapReduceBackupCopyJob.java:196)
  at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
  at 
org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:408)
  at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:348)
  at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:290)
  at 
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605)
{code}
It seems the failure was related to how we use distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20121) Fix findbugs warning for RestoreTablesClient

2018-03-03 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20121:
--

 Summary: Fix findbugs warning for RestoreTablesClient
 Key: HBASE-20121
 URL: https://issues.apache.org/jira/browse/HBASE-20121
 Project: HBase
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ted Yu


In RestoreTablesClient#restore(), the following variable is not used:
{code}
Set backupIdSet = new HashSet<>();
{code}
There is backupIdSet#add() call later in the method but the variable doesn't 
appear in any other part of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >