[jira] [Updated] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13831: --- Description: Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. was: Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial, to leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. TestHBaseFsck#testParallelHbck is flaky --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
Stephen Yuan Jiang created HBASE-13831: -- Summary: TestHBaseFsck#testParallelHbck is flaky Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 1.1.0, 2.0.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial, to leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13666) book.pdf is not renamed during site build
[ https://issues.apache.org/jira/browse/HBASE-13666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571490#comment-14571490 ] Gabor Liptak commented on HBASE-13666: -- Agreed. I will experiment with some pom.xml changes to try to correct. book.pdf is not renamed during site build - Key: HBASE-13666 URL: https://issues.apache.org/jira/browse/HBASE-13666 Project: HBase Issue Type: Task Components: site Reporter: Nick Dimiduk Noticed this while testing HBASE-13665. Looks like the post-site hook is broken or not executed, so the file {{book.pdf}} is not copied over to the expected name {{apache_hbase_reference_guide.pdf}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571581#comment-14571581 ] Apekshit Sharma commented on HBASE-13702: - So, combiners and reducers in the bulk mode are executed in dry run mode too. However, TableReducer in non-bulk mode is not run in dry-mode. ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Attachments: HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571501#comment-14571501 ] Stephen Yuan Jiang commented on HBASE-13831: Testing the patch in both Linux and Windows - running multiple times and no failure. TestHBaseFsck#testParallelHbck is flaky --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial, to leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13831: --- Attachment: HBASE-13831.patch TestHBaseFsck#testParallelHbck is flaky --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial, to leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13831: --- Fix Version/s: 1.1.1 1.2.0 2.0.0 Status: Patch Available (was: Open) TestHBaseFsck#testParallelHbck is flaky --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 1.1.0, 2.0.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial, to leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571697#comment-14571697 ] Stephen Yuan Jiang commented on HBASE-13832: [~mbertozzi] [~enis] FYI Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13831: --- Summary: TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+ (was: TestHBaseFsck#testParallelHbck is flaky) TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+ --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13834) Evict count not properly passed to HeapMemoryTuner.
[ https://issues.apache.org/jira/browse/HBASE-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13834: - Summary: Evict count not properly passed to HeapMemoryTuner. (was: Evict count not properly passed to passed to HeapMemoryTuner.) Evict count not properly passed to HeapMemoryTuner. --- Key: HBASE-13834 URL: https://issues.apache.org/jira/browse/HBASE-13834 Project: HBase Issue Type: Bug Components: hbase, regionserver Affects Versions: 2.0.0 Reporter: Abhilash Assignee: Abhilash Priority: Trivial Labels: easyfix Attachments: EvictCountBug.patch Evict count calculated inside the HeapMemoryManager class in tune function that is passed to HeapMemoryTuner via TunerContext is miscalculated. It is supposed to be Evict count between two intervals but its not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
[ https://issues.apache.org/jira/browse/HBASE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13833: - Attachment: HBASE-13833.02.branch-1.1.patch Might as well use the same connection for the table instance and region locator instances we pass down too. LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad --- Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13833.00.branch-1.1.patch, HBASE-13833.01.branch-1.1.patch, HBASE-13833.02.branch-1.1.patch Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:474) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:405) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:300) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:517) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:466) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.lang.Thread.run(Thread.java:745) ... ... ... 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|Caused by: org.apache.hadoop.hbase.client.NeedUnmanagedConnectionException: The connection has to be unmanaged. 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:724) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:708) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at
[jira] [Commented] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571687#comment-14571687 ] Hadoop QA commented on HBASE-13831: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737346/HBASE-13831.patch against master branch at commit fad545652fc330d11061a1608e7812dade7f0845. ATTACHMENT ID: 12737346 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14270//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14270//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14270//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14270//console This message is automatically generated. TestHBaseFsck#testParallelHbck is flaky --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571764#comment-14571764 ] Stephen Yuan Jiang commented on HBASE-13831: The failed TestReplicationKillSlaveRS.xml test has nothing to do with this patch. TestHBaseFsck#testParallelHbck is flaky --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
[ https://issues.apache.org/jira/browse/HBASE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13833: - Attachment: HBASE-13833.00.branch-1.1.patch Attaching a patch for branch-1.1. Still testing some combinations. LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad --- Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13833.00.branch-1.1.patch Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:474) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:405) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:300) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:517) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:466) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.lang.Thread.run(Thread.java:745) ... ... ... 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|Caused by: org.apache.hadoop.hbase.client.NeedUnmanagedConnectionException: The connection has to be unmanaged. 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:724) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:708) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:542) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at
[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571917#comment-14571917 ] Duo Zhang commented on HBASE-13811: --- {code} flushOpSeqId = getNextSequenceId(wal); flushedSeqId = getFlushedSequenceId(encodedRegionName, flushOpSeqId); {code} I think the problem is here... Before the patch, getFlushedSequenceId will return flushOpSeqId because getEarliestMemstoreSeqNum will return HConstants.NO_SEQNUM. But now we modify getEarliestMemstoreSeqNum, it will also consider the sequenceIds recorded in flushingSequenceIds so it will not return HConstants.NO_SEQNUM even if we decided to flush all stores. This may cause we replay unnecessary edits I think. So the problem is here we only want to consider the sequenceIds in lowestUnflushedSequenceIds, so maybe a new method? Thanks. [~stack] Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13832: --- Affects Version/s: 1.2.0 2.0.0 1.1.0 Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
Nick Dimiduk created HBASE-13833: Summary: LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 2.0.0, 1.2.0, 1.1.1 Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:474) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:405) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:300) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:517) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:466) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.lang.Thread.run(Thread.java:745) ... ... ... 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|Caused by: org.apache.hadoop.hbase.client.NeedUnmanagedConnectionException: The connection has to be unmanaged. 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:724) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:708) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:542) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$4.call(LoadIncrementalHFiles.java:733) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$4.call(LoadIncrementalHFiles.java:720) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) 2015-06-02
[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571866#comment-14571866 ] stack commented on HBASE-13811: --- The TestSplitWalDataLoss hangs for me when I run on cmdline but passes in IDE. Lets see how it does here. I can make a simpler fix to backport to branch-1.1 in case anyone wants to try per-cf-flush. Testing this latest patch on cluster to see how it does. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
[ https://issues.apache.org/jira/browse/HBASE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571910#comment-14571910 ] Nick Dimiduk commented on HBASE-13833: -- Yeah, you're right on the log message. Thanks for the catch. LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad --- Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13833.00.branch-1.1.patch Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:474) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:405) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:300) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:517) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:466) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.lang.Thread.run(Thread.java:745) ... ... ... 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|Caused by: org.apache.hadoop.hbase.client.NeedUnmanagedConnectionException: The connection has to be unmanaged. 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:724) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:708) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:542) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at
[jira] [Commented] (HBASE-13784) Add Async Client Table API
[ https://issues.apache.org/jira/browse/HBASE-13784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571911#comment-14571911 ] stack commented on HBASE-13784: --- bq. I am thinking of either making a clean Table implementation using AsyncTable internally or using the AsyncTable already within HTable itself. Ok. Yes, it would be nice if you could slot in your AsyncTable everywhere to make sure it works in 'sync' mode blocking on the promise. Can you use junit parameterize (http://junit.org/apidocs/org/junit/runners/Parameterized.html) so tests run with current HTable implementation and then again with your Async'd HTable Implementation? You could choose a few of the important client-side junits and have the work in the two modes? Looking at TestFromClientSide, probably the biggest client-side set of tests, it still has a bunch of deprecated 'new HTable' usage. I could help by doing cleanup on this test so it all went via Connection to get Table instances.. Would then have to jigger Connection and ConnectionFactory per parameterized run to return pure HTable for one run as it does now and then your fancy async'y version for the next run. If you make a list of junit test suites you'd like to parameterize and note obstacles to your getting your AsyncTable in under them, I can help with unblocking your obstacles. Thanks [~jurmous] Add Async Client Table API -- Key: HBASE-13784 URL: https://issues.apache.org/jira/browse/HBASE-13784 Project: HBase Issue Type: New Feature Reporter: Jurriaan Mous Assignee: Jurriaan Mous Attachments: HBASE-13784-v1.patch, HBASE-13784-v2.patch, HBASE-13784-v3.patch, HBASE-13784-v4.patch, HBASE-13784.patch With the introduction of the Async HBase RPC Client it is possible to create an Async Table API and more. This issue is focussed on creating a first async Table API so it is possible to do any non deprecated Table call in an async way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13784) Add Async Client Table API
[ https://issues.apache.org/jira/browse/HBASE-13784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571913#comment-14571913 ] stack commented on HBASE-13784: --- If you want feedback on current state of patch, add it to reviews.apache.org and paste link here and I (or others) can you a bit of feedback. Thanks. Add Async Client Table API -- Key: HBASE-13784 URL: https://issues.apache.org/jira/browse/HBASE-13784 Project: HBase Issue Type: New Feature Reporter: Jurriaan Mous Assignee: Jurriaan Mous Attachments: HBASE-13784-v1.patch, HBASE-13784-v2.patch, HBASE-13784-v3.patch, HBASE-13784-v4.patch, HBASE-13784.patch With the introduction of the Async HBase RPC Client it is possible to create an Async Table API and more. This issue is focussed on creating a first async Table API so it is possible to do any non deprecated Table call in an async way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571912#comment-14571912 ] Hudson commented on HBASE-13831: SUCCESS: Integrated in HBase-1.2 #132 (See [https://builds.apache.org/job/HBase-1.2/132/]) HBASE-13831 TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+ (Stephen Jiang) (tedyu: rev 86d30e0bfe62098415a41af2ae7f534488919f24) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+ --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
[ https://issues.apache.org/jira/browse/HBASE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571871#comment-14571871 ] stack commented on HBASE-13833: --- +1 I think this log is wrong way around: LOG.warn(unmanaged connection cannot be used for bulkload. Creating managed connection.); Fix on commit if I have it wrong. LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad --- Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13833.00.branch-1.1.patch Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:474) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:405) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:300) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:517) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:466) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.lang.Thread.run(Thread.java:745) ... ... ... 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|Caused by: org.apache.hadoop.hbase.client.NeedUnmanagedConnectionException: The connection has to be unmanaged. 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:724) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:708) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at
[jira] [Updated] (HBASE-13827) Delayed scanner close in KeyValueHeap and StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13827: --- Status: Open (was: Patch Available) Delayed scanner close in KeyValueHeap and StoreScanner -- Key: HBASE-13827 URL: https://issues.apache.org/jira/browse/HBASE-13827 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13827.patch This is to support the work in HBASE-12295. We have to return the blocks when the close() happens on the HFileScanner. Right now close is not at all there. Will add. The StoreFileScanner will call it on its close(). In KVHeap when we see one of the child scanner runs out of cells, we will remove them from the PriorityQueue as well as close it. Also the same kind of stuff in StoreScanner too. But when we want to do the return block in close() this kind of early close is not correct. Still there might be cells created out of these cached blocks. This Jira aims at changing these container scanners not to do early close. When it seems a child scanner no longer required, it will avoid using it completely but just wont call close(). Instead it will be added to another list for a delayed close and that will be closed when the container scanner close() happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13827) Delayed scanner close in KeyValueHeap and StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13827: --- Attachment: HBASE-13827.patch Reattach for another QA run Delayed scanner close in KeyValueHeap and StoreScanner -- Key: HBASE-13827 URL: https://issues.apache.org/jira/browse/HBASE-13827 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13827.patch, HBASE-13827.patch This is to support the work in HBASE-12295. We have to return the blocks when the close() happens on the HFileScanner. Right now close is not at all there. Will add. The StoreFileScanner will call it on its close(). In KVHeap when we see one of the child scanner runs out of cells, we will remove them from the PriorityQueue as well as close it. Also the same kind of stuff in StoreScanner too. But when we want to do the return block in close() this kind of early close is not correct. Still there might be cells created out of these cached blocks. This Jira aims at changing these container scanners not to do early close. When it seems a child scanner no longer required, it will avoid using it completely but just wont call close(). Instead it will be added to another list for a delayed close and that will be closed when the container scanner close() happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13827) Delayed scanner close in KeyValueHeap and StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13827: --- Status: Patch Available (was: Open) Delayed scanner close in KeyValueHeap and StoreScanner -- Key: HBASE-13827 URL: https://issues.apache.org/jira/browse/HBASE-13827 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13827.patch, HBASE-13827.patch This is to support the work in HBASE-12295. We have to return the blocks when the close() happens on the HFileScanner. Right now close is not at all there. Will add. The StoreFileScanner will call it on its close(). In KVHeap when we see one of the child scanner runs out of cells, we will remove them from the PriorityQueue as well as close it. Also the same kind of stuff in StoreScanner too. But when we want to do the return block in close() this kind of early close is not correct. Still there might be cells created out of these cached blocks. This Jira aims at changing these container scanners not to do early close. When it seems a child scanner no longer required, it will avoid using it completely but just wont call close(). Instead it will be added to another list for a delayed close and that will be closed when the container scanner close() happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
Stephen Yuan Jiang created HBASE-13832: -- Summary: Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571737#comment-14571737 ] Enis Soztutar commented on HBASE-13832: --- I think we should copy the same semantics for the FSHlog sync / log roll behavior. What we have in FSHlog / LogRoller is this: - Log syncer catches IOException, and logs it, and requests log roll. - Log roller tries to roll the log, and if it gets an IOException in file close, or generic IOException while rolling, it aborts the RS. The reason to have the same semantics is that we do not want to cause the master to abort prematurely in case of a recoverable IOException like the one in the jira title. If the RS can ride over generic IOExceptions, the master should do the same. Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13831: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) TestHBaseFsck passed in QA run. Thanks for the patch, Stephen. TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+ --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13682) Compute HDFS locality in parallel.
[ https://issues.apache.org/jira/browse/HBASE-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Changgeng Li updated HBASE-13682: - Assignee: (was: Changgeng Li) Compute HDFS locality in parallel. -- Key: HBASE-13682 URL: https://issues.apache.org/jira/browse/HBASE-13682 Project: HBase Issue Type: Bug Reporter: Elliott Clark Right now when the balancer needs to know about region locality it asks the cache in serial. When the cache is empty or expired it goes to the NN. On larger clusters with lots of blocks this can be really slow. That means that balancer is un-usable while masters are being restarted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13834) Evict count not properly passed to passed to HeapMemoryTuner.
Abhilash created HBASE-13834: Summary: Evict count not properly passed to passed to HeapMemoryTuner. Key: HBASE-13834 URL: https://issues.apache.org/jira/browse/HBASE-13834 Project: HBase Issue Type: Bug Components: hbase, regionserver Affects Versions: 2.0.0 Reporter: Abhilash Assignee: Abhilash Priority: Trivial Evict count calculated inside the HeapMemoryManager class in tune function that is passed to HeapMemoryTuner via TunerContext is miscalculated. It is supposed to be Evict count between two intervals but its not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13817) ByteBufferOuputStream - add writeInt support
[ https://issues.apache.org/jira/browse/HBASE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571889#comment-14571889 ] stack commented on HBASE-13817: --- bq. May be skip? I'd say so. Just confuses. Setting it though it is the default and we don't set it anywhere else we do BB messing. Sorry for my being fixated on this. +1 after doing above. Just commit. Thanks for adding the jmh. ByteBufferOuputStream - add writeInt support Key: HBASE-13817 URL: https://issues.apache.org/jira/browse/HBASE-13817 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13817.patch, HBASE-13817_V2.patch, benchmark.zip While writing Cells to this stream, to make the CellBlock ByteBuffer, we do write length of the cell as int. We use StreamUtils to do this which will write each byte one after the other. So 4 write calls on Stream.(OutputSteam has only this support) With ByteBufferOuputStream we have the overhead of checking for size limit and possible grow with every write call. Internally this stream writes to a ByteBuffer. Again inside the ByteBuffer implementations there is position limit checks. If we do write these length as int in one go we can reduce this overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
[ https://issues.apache.org/jira/browse/HBASE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13833: - Attachment: HBASE-13833.01.branch-1.1.patch This version is careful to clean up the connection after itself. Hat-tip to [~enis]. Also fix the log line. LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad --- Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13833.00.branch-1.1.patch, HBASE-13833.01.branch-1.1.patch Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:474) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:405) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:300) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:517) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:466) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.lang.Thread.run(Thread.java:745) ... ... ... 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|Caused by: org.apache.hadoop.hbase.client.NeedUnmanagedConnectionException: The connection has to be unmanaged. 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:724) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:708) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:542)
[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571850#comment-14571850 ] stack commented on HBASE-13811: --- Updated https://reviews.apache.org/r/34963/ Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13811: -- Attachment: 13811.branch-1.txt Refactored moving all to do with sequenceid accounting into own package protected class. Added then tests for the sequenceid accounting. Added [~Apache9] test to the patch too. [~Apache9] I changed TestGetLastFlushedSequenceId. The supposition that the region flushed id would be greater than the store flush id didn't make sense to me -- perhaps I am missing something. Made more sense that they would be equal after a flush. See what you think. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.branch-1.txt, 13811.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13834) Evict count not properly passed to passed to HeapMemoryTuner.
[ https://issues.apache.org/jira/browse/HBASE-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13834: - Status: Patch Available (was: Open) Evict count not properly passed to passed to HeapMemoryTuner. - Key: HBASE-13834 URL: https://issues.apache.org/jira/browse/HBASE-13834 Project: HBase Issue Type: Bug Components: hbase, regionserver Affects Versions: 2.0.0 Reporter: Abhilash Assignee: Abhilash Priority: Trivial Labels: easyfix Attachments: EvictCountBug.patch Evict count calculated inside the HeapMemoryManager class in tune function that is passed to HeapMemoryTuner via TunerContext is miscalculated. It is supposed to be Evict count between two intervals but its not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13834) Evict count not properly passed to passed to HeapMemoryTuner.
[ https://issues.apache.org/jira/browse/HBASE-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhilash updated HBASE-13834: - Attachment: EvictCountBug.patch Evict count not properly passed to passed to HeapMemoryTuner. - Key: HBASE-13834 URL: https://issues.apache.org/jira/browse/HBASE-13834 Project: HBase Issue Type: Bug Components: hbase, regionserver Affects Versions: 2.0.0 Reporter: Abhilash Assignee: Abhilash Priority: Trivial Labels: easyfix Attachments: EvictCountBug.patch Evict count calculated inside the HeapMemoryManager class in tune function that is passed to HeapMemoryTuner via TunerContext is miscalculated. It is supposed to be Evict count between two intervals but its not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13817) ByteBufferOuputStream - add writeInt support
[ https://issues.apache.org/jira/browse/HBASE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571937#comment-14571937 ] Anoop Sam John commented on HBASE-13817: bq.May be the first version is better? Can we have a consensus? Ram is it ok to commit V2 (after Stack's comment fix). StreamUtils is generic util API that we provided. We get BBOS when we write to CellBlock in RPC layer. Ya today the util API is called only in this place. Still !!! May be later we can see whether we can move this common to some place. ByteBufferOuputStream - add writeInt support Key: HBASE-13817 URL: https://issues.apache.org/jira/browse/HBASE-13817 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13817.patch, HBASE-13817_V2.patch, benchmark.zip While writing Cells to this stream, to make the CellBlock ByteBuffer, we do write length of the cell as int. We use StreamUtils to do this which will write each byte one after the other. So 4 write calls on Stream.(OutputSteam has only this support) With ByteBufferOuputStream we have the overhead of checking for size limit and possible grow with every write call. Internally this stream writes to a ByteBuffer. Again inside the ByteBuffer implementations there is position limit checks. If we do write these length as int in one go we can reduce this overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570710#comment-14570710 ] Hadoop QA commented on HBASE-13829: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737109/HBASE-13829.patch against master branch at commit fad545652fc330d11061a1608e7812dade7f0845. ATTACHMENT ID: 12737109 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +hbase set_quota TYPE = THROTTLE, THROTTLE_TYPE = READ, USER = 'u1', TABLE = 't2', LIMIT = '5K/min' +hbase set_quota TYPE = THROTTLE, THROTTLE_TYPE = WRITE, USER = 'u1', NAMESPACE = 'ns2', LIMIT = NONE +hbase set_quota TYPE = THROTTLE, THROTTLE_TYPE = READ, NAMESPACE = 'ns1', LIMIT = '10req/sec' {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14268//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14268//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14268//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14268//console This message is automatically generated. Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13658) Improve the test run time for TestAccessController class
[ https://issues.apache.org/jira/browse/HBASE-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570701#comment-14570701 ] Ashish Singhi commented on HBASE-13658: --- I have attached patches for those branches. Thanks Improve the test run time for TestAccessController class Key: HBASE-13658 URL: https://issues.apache.org/jira/browse/HBASE-13658 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.98.12 Reporter: Ashish Singhi Assignee: Ashish Singhi Fix For: 2.0.0, 0.98.13, 1.2.0 Attachments: 13658.patch, HBASE-13658-0.98-1.patch, HBASE-13658-branch-1.0.patch, HBASE-13658-branch-1.1.patch, HBASE-13658-v1.patch, HBASE-13658-v2.patch, HBASE-13658.patch Improve the test run time for TestAccessController class -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.
[ https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570685#comment-14570685 ] Samir Ahmic commented on HBASE-13337: - Issue is directly related with changing rpc client implementation from RpcClientImpl to AsyncRpcClient. I was testing master branch with RpcClientImpl and issue is gone. Maybe we should check how we handle connection failures in AsyncRpcClient. Table regions are not assigning back, after restarting all regionservers at once. - Key: HBASE-13337 URL: https://issues.apache.org/jira/browse/HBASE-13337 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 2.0.0 Reporter: Y. SREENIVASULU REDDY Priority: Blocker Fix For: 2.0.0 Attachments: HBASE-13337-v2.patch, HBASE-13337.patch Regions of the table are continouly in state=FAILED_CLOSE. {noformat} RegionState RIT time (ms) 8f62e819b356736053e06240f7f7c6fd t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113929 caf59209ae65ea80fca6bdc6996a7d68 t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM2,16040,1427362533691 113929 db52a74988f71e5cf257bbabf31f26f3 t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM3,16040,1427362533691 113920 43f3a65b9f9ff283f598c5450feab1f8 t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113920 {noformat} *Steps to reproduce:* 1. Start HBase cluster with more than one regionserver. 2. Create a table with precreated regions. (lets say 15 regions) 3. Make sure the regions are well balanced. 4. Restart all the Regionservers process at once across the cluster, except HMaster process 5. After restarting the Regionservers, successfully will connect to the HMaster. *Bug:* But no regions are assigning back to the Regionservers. *Master log shows as follows:* {noformat} 2015-03-26 15:05:36,201 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,202 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_OPENsn=VM1,16040,1427362531818 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Force region state offline {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_CLOSE 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned
[jira] [Commented] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570434#comment-14570434 ] Hudson commented on HBASE-13826: SUCCESS: Integrated in HBase-1.2 #131 (See [https://builds.apache.org/job/HBase-1.2/131/]) HBASE-13826 Unable to create table when group acls are appropriately set. (ssrungarapu: rev e8914f26d2fcb3958d839f6c69d7c4a308cd5512) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController2.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570467#comment-14570467 ] Hudson commented on HBASE-13826: FAILURE: Integrated in HBase-1.1 #521 (See [https://builds.apache.org/job/HBase-1.1/521/]) HBASE-13826 Unable to create table when group acls are appropriately set. (ssrungarapu: rev ff5f02efeb297d3e1bf5dc5c38e80c61a39e81fc) * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController2.java Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13806) Check the mob files when there are mob-enabled columns in HFileCorruptionChecker
[ https://issues.apache.org/jira/browse/HBASE-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-13806: - Attachment: HBASE-13806-V3.diff Update the patch to handle the corrupt file in reading. Check the mob files when there are mob-enabled columns in HFileCorruptionChecker Key: HBASE-13806 URL: https://issues.apache.org/jira/browse/HBASE-13806 Project: HBase Issue Type: Sub-task Components: mob Affects Versions: hbase-11339 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: HBASE-13806-V2.diff, HBASE-13806-V3.diff, HBASE-13806.diff Now in HFileCorruptionChecker, it only checks the files in regions. We need check the mob files too if there are mob-enabled columns in that table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570596#comment-14570596 ] Hudson commented on HBASE-13826: FAILURE: Integrated in HBase-0.98 #1019 (See [https://builds.apache.org/job/HBase-0.98/1019/]) HBASE-13826 Unable to create table when group acls are appropriately set. (ssrungarapu: rev ea7d2c3c0406324de649c9f44978e5643de758b7) * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController2.java HBASE-13826 Unable to create table when group acls are appropriately set; ADDENDUM - Get rid of connection api in unit test (ssrungarapu: rev 98e63c3cd27ad001c3ae82dcd9d32c580e26ff95) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController2.java Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13806) Check the mob files when there are mob-enabled columns in HFileCorruptionChecker
[ https://issues.apache.org/jira/browse/HBASE-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570608#comment-14570608 ] Jingcheng Du commented on HBASE-13806: -- In the first approach, how about to have a coprocoessor(endpoint) to scan per region? Check the mob files when there are mob-enabled columns in HFileCorruptionChecker Key: HBASE-13806 URL: https://issues.apache.org/jira/browse/HBASE-13806 Project: HBase Issue Type: Sub-task Components: mob Affects Versions: hbase-11339 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: HBASE-13806-V2.diff, HBASE-13806-V3.diff, HBASE-13806.diff Now in HFileCorruptionChecker, it only checks the files in regions. We need check the mob files too if there are mob-enabled columns in that table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13806) Check the mob files when there are mob-enabled columns in HFileCorruptionChecker
[ https://issues.apache.org/jira/browse/HBASE-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570609#comment-14570609 ] Jingcheng Du commented on HBASE-13806: -- In the first approach, how about to have a coprocoessor(endpoint) to scan per region? Check the mob files when there are mob-enabled columns in HFileCorruptionChecker Key: HBASE-13806 URL: https://issues.apache.org/jira/browse/HBASE-13806 Project: HBase Issue Type: Sub-task Components: mob Affects Versions: hbase-11339 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: HBASE-13806-V2.diff, HBASE-13806-V3.diff, HBASE-13806.diff Now in HFileCorruptionChecker, it only checks the files in regions. We need check the mob files too if there are mob-enabled columns in that table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13658) Improve the test run time for TestAccessController class
[ https://issues.apache.org/jira/browse/HBASE-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-13658: -- Attachment: HBASE-13658-branch-1.1.patch HBASE-13658-branch-1.0.patch Improve the test run time for TestAccessController class Key: HBASE-13658 URL: https://issues.apache.org/jira/browse/HBASE-13658 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.98.12 Reporter: Ashish Singhi Assignee: Ashish Singhi Fix For: 2.0.0, 0.98.13, 1.2.0 Attachments: 13658.patch, HBASE-13658-0.98-1.patch, HBASE-13658-branch-1.0.patch, HBASE-13658-branch-1.1.patch, HBASE-13658-v1.patch, HBASE-13658-v2.patch, HBASE-13658.patch Improve the test run time for TestAccessController class -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570353#comment-14570353 ] Hudson commented on HBASE-13826: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #971 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/971/]) HBASE-13826 Unable to create table when group acls are appropriately set. (ssrungarapu: rev ea7d2c3c0406324de649c9f44978e5643de758b7) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController2.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570349#comment-14570349 ] Srikanth Srungarapu commented on HBASE-13826: - Too eager to get this fix in :). Thanks for bringing this to notice. Pushed the addendum. Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570524#comment-14570524 ] Ashish Singhi commented on HBASE-13829: --- Also can you add this example and some explanation in the book. Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13658) Improve the test run time for TestAccessController class
[ https://issues.apache.org/jira/browse/HBASE-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570354#comment-14570354 ] Srikanth Srungarapu commented on HBASE-13658: - Can you please upload the patches for those branches? Improve the test run time for TestAccessController class Key: HBASE-13658 URL: https://issues.apache.org/jira/browse/HBASE-13658 Project: HBase Issue Type: Sub-task Components: test Affects Versions: 0.98.12 Reporter: Ashish Singhi Assignee: Ashish Singhi Fix For: 2.0.0, 0.98.13, 1.2.0 Attachments: 13658.patch, HBASE-13658-0.98-1.patch, HBASE-13658-v1.patch, HBASE-13658-v2.patch, HBASE-13658.patch Improve the test run time for TestAccessController class -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13806) Check the mob files when there are mob-enabled columns in HFileCorruptionChecker
[ https://issues.apache.org/jira/browse/HBASE-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-13806: - Attachment: HBASE-13806-V2.diff Update the patch according to Anoop's comments to execute the mob checker in parallel. Check the mob files when there are mob-enabled columns in HFileCorruptionChecker Key: HBASE-13806 URL: https://issues.apache.org/jira/browse/HBASE-13806 Project: HBase Issue Type: Sub-task Components: mob Affects Versions: hbase-11339 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: HBASE-13806-V2.diff, HBASE-13806.diff Now in HFileCorruptionChecker, it only checks the files in regions. We need check the mob files too if there are mob-enabled columns in that table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-13829: --- Description: HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ was:HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.
[ https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570474#comment-14570474 ] Samir Ahmic commented on HBASE-13337: - Thanks for review [~eclark], I agree connection should be closed, I will try to resolve this some some other way. Any suggestions ? Table regions are not assigning back, after restarting all regionservers at once. - Key: HBASE-13337 URL: https://issues.apache.org/jira/browse/HBASE-13337 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 2.0.0 Reporter: Y. SREENIVASULU REDDY Priority: Blocker Fix For: 2.0.0 Attachments: HBASE-13337-v2.patch, HBASE-13337.patch Regions of the table are continouly in state=FAILED_CLOSE. {noformat} RegionState RIT time (ms) 8f62e819b356736053e06240f7f7c6fd t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113929 caf59209ae65ea80fca6bdc6996a7d68 t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM2,16040,1427362533691 113929 db52a74988f71e5cf257bbabf31f26f3 t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM3,16040,1427362533691 113920 43f3a65b9f9ff283f598c5450feab1f8 t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113920 {noformat} *Steps to reproduce:* 1. Start HBase cluster with more than one regionserver. 2. Create a table with precreated regions. (lets say 15 regions) 3. Make sure the regions are well balanced. 4. Restart all the Regionservers process at once across the cluster, except HMaster process 5. After restarting the Regionservers, successfully will connect to the HMaster. *Bug:* But no regions are assigning back to the Regionservers. *Master log shows as follows:* {noformat} 2015-03-26 15:05:36,201 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,202 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_OPENsn=VM1,16040,1427362531818 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Force region state offline {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_CLOSE 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of
[jira] [Commented] (HBASE-13825) Get operations on large objects fail with protocol errors
[ https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570472#comment-14570472 ] Dev Lakhani commented on HBASE-13825: - Thanks for the suggestion [~apurtell] , this is what the stack trace suggests but please can you help with a code snippet? When you say change it in the client do you mean the Hbase client or the application client calling the get. I am only able/permitted to use pre-built Hbase jars from maven so cannot change Hbase code in any way. CodedInputStream.setSizeLimit() suggests using a static method which does not exist. Furthermore I have no instances of CodedInputStream in by application client so where should I set this size limit? Is it work adding a HBase parameter for this? Get operations on large objects fail with protocol errors - Key: HBASE-13825 URL: https://issues.apache.org/jira/browse/HBASE-13825 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 1.0.1 Reporter: Dev Lakhani When performing a get operation on a column family with more than 64MB of data, the operation fails with: Caused by: Portable(java.io.IOException): Call to host:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381) at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753) at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395) This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but that issue is related to cluster status. Scan and put operations on the same data work fine Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13827) Delayed scanner close in KeyValueHeap and StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570473#comment-14570473 ] Hadoop QA commented on HBASE-13827: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737092/HBASE-13827.patch against master branch at commit fad545652fc330d11061a1608e7812dade7f0845. ATTACHMENT ID: 12737092 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14267//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14267//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14267//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14267//console This message is automatically generated. Delayed scanner close in KeyValueHeap and StoreScanner -- Key: HBASE-13827 URL: https://issues.apache.org/jira/browse/HBASE-13827 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13827.patch This is to support the work in HBASE-12295. We have to return the blocks when the close() happens on the HFileScanner. Right now close is not at all there. Will add. The StoreFileScanner will call it on its close(). In KVHeap when we see one of the child scanner runs out of cells, we will remove them from the PriorityQueue as well as close it. Also the same kind of stuff in StoreScanner too. But when we want to do the return block in close() this kind of early close is not correct. Still there might be cells created out of these cached blocks. This Jira aims at changing these container scanners not to do early close. When it seems a child scanner no longer required, it will avoid using it completely but just wont call close(). Instead it will be added to another list for a delayed close and that will be closed when the container scanner close() happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.
[ https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samir Ahmic updated HBASE-13337: Status: Open (was: Patch Available) Table regions are not assigning back, after restarting all regionservers at once. - Key: HBASE-13337 URL: https://issues.apache.org/jira/browse/HBASE-13337 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 2.0.0 Reporter: Y. SREENIVASULU REDDY Priority: Blocker Fix For: 2.0.0 Attachments: HBASE-13337-v2.patch, HBASE-13337.patch Regions of the table are continouly in state=FAILED_CLOSE. {noformat} RegionState RIT time (ms) 8f62e819b356736053e06240f7f7c6fd t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113929 caf59209ae65ea80fca6bdc6996a7d68 t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM2,16040,1427362533691 113929 db52a74988f71e5cf257bbabf31f26f3 t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM3,16040,1427362533691 113920 43f3a65b9f9ff283f598c5450feab1f8 t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113920 {noformat} *Steps to reproduce:* 1. Start HBase cluster with more than one regionserver. 2. Create a table with precreated regions. (lets say 15 regions) 3. Make sure the regions are well balanced. 4. Restart all the Regionservers process at once across the cluster, except HMaster process 5. After restarting the Regionservers, successfully will connect to the HMaster. *Bug:* But no regions are assigning back to the Regionservers. *Master log shows as follows:* {noformat} 2015-03-26 15:05:36,201 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,202 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_OPENsn=VM1,16040,1427362531818 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Force region state offline {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_CLOSE 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned
[jira] [Created] (HBASE-13830) Hbase REVERSED may throw Exception sometimes
ryan.jin created HBASE-13830: Summary: Hbase REVERSED may throw Exception sometimes Key: HBASE-13830 URL: https://issues.apache.org/jira/browse/HBASE-13830 Project: HBase Issue Type: Bug Affects Versions: 0.98.1 Reporter: ryan.jin run a scan at hbase shell command. {code} scan 'analytics_access',{ENDROW='9223370603647713262-flume01.hadoop-10.32.117.111-373563509',LIMIT=10,REVERSED=true} {code} will throw exception {code} java.io.IOException: java.io.IOException: Could not seekToPreviousRow StoreFileScanner[HFileScanner for reader reader=hdfs://nameservice1/hbase/data/default/analytics_access/a54c47c568c00dd07f9d92cfab1accc7/cf/2e3a107e9fec4930859e992b61fb22f6, compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=9223370603542781142-flume01.hadoop-10.32.117.111-378180911/cf:key/1433311994702/Put, lastKey=9223370603715515112-flume01.hadoop-10.32.117.111-370923552/cf:timestamp/1433139261951/Put, avgKeyLen=80, avgValueLen=115, entries=43544340, length=1409247455, cur=9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0] to key 9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:448) at org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.seekToPreviousRow(ReversedKeyValueHeap.java:88) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToPreviousRow(ReversedStoreScanner.java:128) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToNextRow(ReversedStoreScanner.java:88) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:503) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3866) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3946) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3814) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3805) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3136) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: On-disk size without header provided is 47701, but block header contains 10134. Block offset: -1, data starts with: DATABLK*\x00\x00'\x96\x00\x01\x00\x04\x00\x00\x00\x005\x96^\xD2\x01\x00\x00@\x00\x00\x00' at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:451) at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$400(HFileBlock.java:87) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1466) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:569) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:413) ... 17 more at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.6.0_65] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) ~[na:1.6.0_65] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) ~[na:1.6.0_65] at java.lang.reflect.Constructor.newInstance(Constructor.java:513) ~[na:1.6.0_65] at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) ~[hadoop-common-2.3.0-cdh5.1.0.jar:na] at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) ~[hadoop-common-2.3.0-cdh5.1.0.jar:na] at
[jira] [Commented] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570532#comment-14570532 ] Hudson commented on HBASE-13826: FAILURE: Integrated in HBase-TRUNK #6543 (See [https://builds.apache.org/job/HBase-TRUNK/6543/]) HBASE-13826 Unable to create table when group acls are appropriately set. (ssrungarapu: rev fad545652fc330d11061a1608e7812dade7f0845) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController2.java * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13811) Splitting WALs, we are filtering out too many edits - DATALOSS
[ https://issues.apache.org/jira/browse/HBASE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570430#comment-14570430 ] Hadoop QA commented on HBASE-13811: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737087/HBASE-13811-v1.testcase.patch against master branch at commit 722fd17069a302f4de12c22212d54d80bed81aed. ATTACHMENT ID: 12737087 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14266//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14266//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14266//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14266//console This message is automatically generated. Splitting WALs, we are filtering out too many edits - DATALOSS --- Key: HBASE-13811 URL: https://issues.apache.org/jira/browse/HBASE-13811 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0 Attachments: 13811.branch-1.txt, 13811.txt, HBASE-13811-v1.testcase.patch, HBASE-13811.testcase.patch I've been running ITBLLs against branch-1 around HBASE-13616 (move of ServerShutdownHandler to pv2). I have come across an instance of dataloss. My patch for HBASE-13616 was in place so can only think it the cause (but cannot see how). When we split the logs, we are skipping legit edits. Digging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-13829: --- Status: Patch Available (was: Open) Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-13829: -- Status: Patch Available (was: Open) Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570518#comment-14570518 ] Ashish Singhi commented on HBASE-13829: --- Patch looks ok to me. bq. The request(read/write) limit can be expressed using the form 100req/sec, 100req/min Can we explain this more in detail to the hbase shell user that he/she can either set quota on read, write or on both the requests together(i.e., read + write). Also can you add a unit test where quota is set on read_num and assert that there will be no quota control on the write_num requests. Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570334#comment-14570334 ] ramkrishna.s.vasudevan commented on HBASE-12295: Given that we can solve this without using Result we could continue the way we handle with scanners and scanners doing the ref counting part. I would any way think that if we use Result, may be we need not increment every cell coming from that block. Per block used in the result we can increment the count. But doing anything with Result would need to have a Server side result object that may do some extra APIs/methods to handle these return/close stuff. As Stack points out bq.I would say that the creation of the CellBlock in memory is something we would like to get away from some day. If this has to happen then its better we handle things with the Result - but incrementing the ref count for every cell in the block may be we can see if there is another way. For now we are not handling this through Result and handling things using CellBlock and existing scanners. Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots
[ https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570387#comment-14570387 ] Hudson commented on HBASE-13356: SUCCESS: Integrated in HBase-TRUNK #6542 (See [https://builds.apache.org/job/HBase-TRUNK/6542/]) HBASE-13356 HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots (Andrew Mains) (tedyu: rev 722fd17069a302f4de12c22212d54d80bed81aed) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestConfigurationUtil.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableSnapshotInputFormat.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormatImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapred/TestMultiTableSnapshotInputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatTestBase.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/MultiTableSnapshotInputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormat.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormatImpl.java HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots -- Key: HBASE-13356 URL: https://issues.apache.org/jira/browse/HBASE-13356 Project: HBase Issue Type: New Feature Components: mapreduce Reporter: Andrew Mains Assignee: Andrew Mains Priority: Minor Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch Currently, HBase supports the pushing of multiple scans to mapreduce jobs over live tables (via MultiTableInputFormat) but only supports a single scan for mapreduce jobs over table snapshots. It would be handy to support multiple scans over snapshots as well, probably through another input format (MultiTableSnapshotInputFormat?). To mimic the functionality present in MultiTableInputFormat, the new input format would likely have to take in the names of all snapshots used in addition to the scans. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13829) Add more ThrottleType
Guanghao Zhang created HBASE-13829: -- Summary: Add more ThrottleType Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570456#comment-14570456 ] Hudson commented on HBASE-13826: FAILURE: Integrated in HBase-1.0 #945 (See [https://builds.apache.org/job/HBase-1.0/945/]) HBASE-13826 Unable to create table when group acls are appropriately set. (ssrungarapu: rev 0214c125ec3f7245ac367bff60ff727703868bd9) * hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController2.java Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-13829: --- Attachment: HBASE-13829.patch Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-13829: --- Status: Open (was: Patch Available) Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570442#comment-14570442 ] Guanghao Zhang commented on HBASE-13829: [~mbertozzi][~ashish singhi] please review. Thanks. Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13806) Check the mob files when there are mob-enabled columns in HFileCorruptionChecker
[ https://issues.apache.org/jira/browse/HBASE-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570580#comment-14570580 ] Jingcheng Du commented on HBASE-13806: -- Hi [~jmhsieh], [~anoopsamjohn], [~ram_krish]. This patch is used to check the corrupt mob files. And next we need a tool to check the integrity of mob cells. I think Jon's idea is very good. Read the hfile and check whether its mob file is present. If the mob file is not there, how should we do? Add a delete marker for this ref cell from table client? Or alternatively, could we do some improvement in scanner? Now if the mob file is not present or the mob file is corrupt, a cell with a null value is returned. Could we just filter these cells and read more in scanner? In this way, the dangling ref cells are not visible, and these dangling ref cells can be deleted when they are expired. Please advise. Thanks! Check the mob files when there are mob-enabled columns in HFileCorruptionChecker Key: HBASE-13806 URL: https://issues.apache.org/jira/browse/HBASE-13806 Project: HBase Issue Type: Sub-task Components: mob Affects Versions: hbase-11339 Reporter: Jingcheng Du Assignee: Jingcheng Du Fix For: hbase-11339 Attachments: HBASE-13806-V2.diff, HBASE-13806-V3.diff, HBASE-13806.diff Now in HFileCorruptionChecker, it only checks the files in regions. We need check the mob files too if there are mob-enabled columns in that table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570867#comment-14570867 ] Bhupendra Kumar Jain commented on HBASE-13702: -- As per current patch, dry-run executes only Map task, so its useful only when Map task is having lot of extra code logic (parsing, validating, transformation etc... ). Dry run can execute that logic and output the errors. But there might be many logic present in Combiner, Reducer phase also, Which dry-run will not check. So I think better to rename the dry-run function as *dry-run-map*. It will be much clear. ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Attachments: HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570932#comment-14570932 ] Ted Yu commented on HBASE-13829: https://reviews.apache.org/r/34989/ is private, can you make it public ? Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13827) Delayed scanner close in KeyValueHeap and StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570967#comment-14570967 ] Ted Yu commented on HBASE-13827: lgtm Maybe submit for another QA run. Delayed scanner close in KeyValueHeap and StoreScanner -- Key: HBASE-13827 URL: https://issues.apache.org/jira/browse/HBASE-13827 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13827.patch This is to support the work in HBASE-12295. We have to return the blocks when the close() happens on the HFileScanner. Right now close is not at all there. Will add. The StoreFileScanner will call it on its close(). In KVHeap when we see one of the child scanner runs out of cells, we will remove them from the PriorityQueue as well as close it. Also the same kind of stuff in StoreScanner too. But when we want to do the return block in close() this kind of early close is not correct. Still there might be cells created out of these cached blocks. This Jira aims at changing these container scanners not to do early close. When it seems a child scanner no longer required, it will avoid using it completely but just wont call close(). Instead it will be added to another list for a delayed close and that will be closed when the container scanner close() happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators
[ https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13451: --- Status: Patch Available (was: Open) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators -- Key: HBASE-13451 URL: https://issues.apache.org/jira/browse/HBASE-13451 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-13451.patch After HBASE-10800 we could ensure that all the blockKeys in the BlockReader are converted to Cells (KeyOnlyKeyValue) so that we could use CellComparators. Note that this can be done only for the keys that are written using CellComparators and not for the ones using RawBytesComparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators
[ https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13451: --- Attachment: HBASE-13451.patch The HFileBlockIndex's BlockReader is now an abstract class and we create two types of Block Index Readers. One is the ByteArrayBasedKeyBlockReader and CellBasedBlockIndexReader. The code path is now clearly seperated out for index blocks that operate on byte[] and those which can operate on the serialized key of the KV format. The meta blocks and the ROW bloom come under the byte[] based index and the data blocks and ROW_COL bloom come under the cell based index. The advantage we get is that the Cell related comparison and byte[] based comparisons can be easily distinguished and the CellComparator can be used effectively. The main perf benefit we get is that we avoid creating the KeyOnlyKV every time when we do the binary search. Bonus we get is that Bytes.binarySearch() will now have one two flavours - a pure byte[] based binarySearch and another with Cell based binary search. Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators -- Key: HBASE-13451 URL: https://issues.apache.org/jira/browse/HBASE-13451 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-13451.patch After HBASE-10800 we could ensure that all the blockKeys in the BlockReader are converted to Cells (KeyOnlyKeyValue) so that we could use CellComparators. Note that this can be done only for the keys that are written using CellComparators and not for the ones using RawBytesComparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators
[ https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571247#comment-14571247 ] ramkrishna.s.vasudevan edited comment on HBASE-13451 at 6/3/15 4:12 PM: The HFileBlockIndex's BlockReader is now an abstract class and we create two types of Block Index Readers. One is the ByteArrayBasedKeyBlockReader and CellBasedBlockIndexReader. The code path is now clearly seperated out for index blocks that operate on byte[] and those which can operate on the serialized key of the KV format. The meta blocks and the ROW bloom come under the byte[] based index and the data blocks and ROW_COL bloom come under the cell based index. The advantage we get is that the Cell related comparison and byte[] based comparisons can be easily distinguished and the CellComparator can be used effectively. The main perf benefit we get is that we avoid creating the KeyOnlyKV every time when we do the binary search. Bonus we get is that Bytes.binarySearch() will now have only two flavours - a pure byte[] based binarySearch and another with Cell based binary search. was (Author: ram_krish): The HFileBlockIndex's BlockReader is now an abstract class and we create two types of Block Index Readers. One is the ByteArrayBasedKeyBlockReader and CellBasedBlockIndexReader. The code path is now clearly seperated out for index blocks that operate on byte[] and those which can operate on the serialized key of the KV format. The meta blocks and the ROW bloom come under the byte[] based index and the data blocks and ROW_COL bloom come under the cell based index. The advantage we get is that the Cell related comparison and byte[] based comparisons can be easily distinguished and the CellComparator can be used effectively. The main perf benefit we get is that we avoid creating the KeyOnlyKV every time when we do the binary search. Bonus we get is that Bytes.binarySearch() will now have one two flavours - a pure byte[] based binarySearch and another with Cell based binary search. Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators -- Key: HBASE-13451 URL: https://issues.apache.org/jira/browse/HBASE-13451 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-13451.patch After HBASE-10800 we could ensure that all the blockKeys in the BlockReader are converted to Cells (KeyOnlyKeyValue) so that we could use CellComparators. Note that this can be done only for the keys that are written using CellComparators and not for the ones using RawBytesComparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13378) RegionScannerImpl synchronized for READ_UNCOMMITTED Isolation Levels
[ https://issues.apache.org/jira/browse/HBASE-13378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570992#comment-14570992 ] John Leach commented on HBASE-13378: Seems like a nit IMO when you are comparing it to synchronizing all gets/scans in HBase... RegionScannerImpl synchronized for READ_UNCOMMITTED Isolation Levels Key: HBASE-13378 URL: https://issues.apache.org/jira/browse/HBASE-13378 Project: HBase Issue Type: New Feature Reporter: John Leach Assignee: John Leach Priority: Minor Attachments: HBASE-13378.patch, HBASE-13378.txt Original Estimate: 2h Time Spent: 2h Remaining Estimate: 0h This block of code below coupled with the close method could be changed so that READ_UNCOMMITTED does not synchronize. {CODE:JAVA} // synchronize on scannerReadPoints so that nobody calculates // getSmallestReadPoint, before scannerReadPoints is updated. IsolationLevel isolationLevel = scan.getIsolationLevel(); synchronized(scannerReadPoints) { this.readPt = getReadpoint(isolationLevel); scannerReadPoints.put(this, this.readPt); } {CODE} This hotspots for me under heavy get requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13451) Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators
[ https://issues.apache.org/jira/browse/HBASE-13451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571260#comment-14571260 ] ramkrishna.s.vasudevan commented on HBASE-13451: https://reviews.apache.org/r/35008/ - RB link. Make the HFileBlockIndex blockKeys to Cells so that it could be easy to use in the CellComparators -- Key: HBASE-13451 URL: https://issues.apache.org/jira/browse/HBASE-13451 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-13451.patch After HBASE-10800 we could ensure that all the blockKeys in the BlockReader are converted to Cells (KeyOnlyKeyValue) so that we could use CellComparators. Note that this can be done only for the keys that are written using CellComparators and not for the ones using RawBytesComparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13420) RegionEnvironment.offerExecutionLatency Blocks Threads under Heavy Load
[ https://issues.apache.org/jira/browse/HBASE-13420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571253#comment-14571253 ] John Leach commented on HBASE-13420: Andrew, Sorry for the delay, I have been jumping around a bit. We just tested your patch during a data load of the LINE_ITEM table for the TPCC benchmark. Your change removed 140 seconds of blocked CPU for a 30M row load. Regards, John RegionEnvironment.offerExecutionLatency Blocks Threads under Heavy Load --- Key: HBASE-13420 URL: https://issues.apache.org/jira/browse/HBASE-13420 Project: HBase Issue Type: Improvement Reporter: John Leach Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Attachments: 1M-0.98.12.svg, 1M-0.98.13-SNAPSHOT.svg, HBASE-13420.patch, HBASE-13420.txt, hbase-13420.tar.gz, offerExecutionLatency.tiff Original Estimate: 3h Remaining Estimate: 3h The ArrayBlockingQueue blocks threads for 20s during a performance run focusing on creating numerous small scans. I see a buffer size of (100) private final BlockingQueueLong coprocessorTimeNanos = new ArrayBlockingQueueLong( LATENCY_BUFFER_SIZE); and then I see a drain coming from MetricsRegionWrapperImpl with 45 second executor HRegionMetricsWrapperRunable RegionCoprocessorHost#getCoprocessorExecutionStatistics() RegionCoprocessorHost#getExecutionLatenciesNanos() Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571343#comment-14571343 ] Ted Yu commented on HBASE-13829: {code} 35/** Throttling based on the number of write request per time-unit */ {code} 'number of write request' - 'number of write requests' {code} 41/** Throttling based on the number of read request per time-unit */ {code} Use plural for 'read request' Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13817) ByteBufferOuputStream - add writeInt support
[ https://issues.apache.org/jira/browse/HBASE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572101#comment-14572101 ] ramkrishna.s.vasudevan commented on HBASE-13817: bq.Can we have a consensus? I meant this version HBASE-13817.patch of the patch. Though it pollutes StreamUtils still it is only one place I thought. So after the latest API name change it is V3. So I would go with V3. ByteBufferOuputStream - add writeInt support Key: HBASE-13817 URL: https://issues.apache.org/jira/browse/HBASE-13817 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13817.patch, HBASE-13817_V2.patch, HBASE-13817_V3.patch, benchmark.zip While writing Cells to this stream, to make the CellBlock ByteBuffer, we do write length of the cell as int. We use StreamUtils to do this which will write each byte one after the other. So 4 write calls on Stream.(OutputSteam has only this support) With ByteBufferOuputStream we have the overhead of checking for size limit and possible grow with every write call. Internally this stream writes to a ByteBuffer. Again inside the ByteBuffer implementations there is position limit checks. If we do write these length as int in one go we can reduce this overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12295: --- Status: Patch Available (was: Open) Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13834) Evict count not properly passed to HeapMemoryTuner.
[ https://issues.apache.org/jira/browse/HBASE-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572022#comment-14572022 ] Hadoop QA commented on HBASE-13834: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737417/EvictCountBug.patch against master branch at commit e8e5a9f6398f5a99f1d89be359212a7a4f1d7b05. ATTACHMENT ID: 12737417 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14272//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14272//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14272//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14272//console This message is automatically generated. Evict count not properly passed to HeapMemoryTuner. --- Key: HBASE-13834 URL: https://issues.apache.org/jira/browse/HBASE-13834 Project: HBase Issue Type: Bug Components: hbase, regionserver Affects Versions: 2.0.0 Reporter: Abhilash Assignee: Abhilash Priority: Trivial Labels: easyfix Attachments: EvictCountBug.patch, HBASE-13834.patch Evict count calculated inside the HeapMemoryManager class in tune function that is passed to HeapMemoryTuner via TunerContext is miscalculated. It is supposed to be Evict count between two intervals but its not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13826) Unable to create table when group acls are appropriately set.
[ https://issues.apache.org/jira/browse/HBASE-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13826: --- Fix Version/s: (was: 0.98.14) 0.98.13 Unable to create table when group acls are appropriately set. - Key: HBASE-13826 URL: https://issues.apache.org/jira/browse/HBASE-13826 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13826.patch Steps for reproducing the issue. - Create user 'test' and group 'hbase-admin'. - Grant global create permissions to 'hbase-admin'. - Add user 'test' to 'hbase-admin' group. - Create table operation for 'test' user will throw ADE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572169#comment-14572169 ] ramkrishna.s.vasudevan commented on HBASE-12295: New RB link -https://reviews.apache.org/r/35045/ Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13817) ByteBufferOuputStream - add writeInt support
[ https://issues.apache.org/jira/browse/HBASE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572198#comment-14572198 ] stack commented on HBASE-13817: --- +1 on v3. Thanks [~anoop.hbase] ByteBufferOuputStream - add writeInt support Key: HBASE-13817 URL: https://issues.apache.org/jira/browse/HBASE-13817 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13817.patch, HBASE-13817_V2.patch, HBASE-13817_V3.patch, benchmark.zip While writing Cells to this stream, to make the CellBlock ByteBuffer, we do write length of the cell as int. We use StreamUtils to do this which will write each byte one after the other. So 4 write calls on Stream.(OutputSteam has only this support) With ByteBufferOuputStream we have the overhead of checking for size limit and possible grow with every write call. Internally this stream writes to a ByteBuffer. Again inside the ByteBuffer implementations there is position limit checks. If we do write these length as int in one go we can reduce this overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13735) race condition for web interface during master start up
[ https://issues.apache.org/jira/browse/HBASE-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-13735: - Status: Patch Available (was: Open) race condition for web interface during master start up --- Key: HBASE-13735 URL: https://issues.apache.org/jira/browse/HBASE-13735 Project: HBase Issue Type: Bug Components: master Affects Versions: 1.0.1 Reporter: Sean Busbey Assignee: Pankaj Kumar Priority: Minor Attachments: HBASE-13735.patch loaded the master web page while the master was starting up and managed to hit a HTTP 500 with a NPE. {code} java.lang.NullPointerException at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.parse(MasterAddressTracker.java:236) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterInfoPort(MasterAddressTracker.java:86) at org.apache.hadoop.hbase.tmpl.master.BackupMasterStatusTmplImpl.renderNoFlush(BackupMasterStatusTmplImpl.java:53) at org.apache.hadoop.hbase.tmpl.master.BackupMasterStatusTmpl.renderNoFlush(BackupMasterStatusTmpl.java:113) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:309) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:373) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:364) at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:81) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1351) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
[ https://issues.apache.org/jira/browse/HBASE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13833: - Fix Version/s: (was: 1.2.0) (was: 2.0.0) Status: Patch Available (was: Open) Not sure about other fix versions, need to check. LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad --- Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 1.1.1 Attachments: HBASE-13833.00.branch-1.1.patch, HBASE-13833.01.branch-1.1.patch, HBASE-13833.02.branch-1.1.patch Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:474) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:405) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:300) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:517) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.mapreduce.CsvBulkLoadTool$TableLoader.call(CsvBulkLoadTool.java:466) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172) 2015-06-02 05:01:56,122|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2015-06-02 05:01:56,124|beaver.machine|INFO|7800|2276|MainThread|at java.lang.Thread.run(Thread.java:745) ... ... ... 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|Caused by: org.apache.hadoop.hbase.client.NeedUnmanagedConnectionException: The connection has to be unmanaged. 2015-06-02 05:58:34,993|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:724) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getTable(ConnectionManager.java:708) 2015-06-02 05:58:34,994|beaver.machine|INFO|2828|7140|MainThread|at
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571945#comment-14571945 ] Guanghao Zhang commented on HBASE-13829: OK, I will explain more detail in the next patch and add ut for it. Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13829) Add more ThrottleType
[ https://issues.apache.org/jira/browse/HBASE-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571947#comment-14571947 ] Guanghao Zhang commented on HBASE-13829: OK. I can do that. Add more ThrottleType - Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 Attachments: HBASE-13829.patch HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. REVIEW BOARD: https://reviews.apache.org/r/34989/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13827) Delayed scanner close in KeyValueHeap and StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572048#comment-14572048 ] Hadoop QA commented on HBASE-13827: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737428/HBASE-13827.patch against master branch at commit e8e5a9f6398f5a99f1d89be359212a7a4f1d7b05. ATTACHMENT ID: 12737428 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14274//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14274//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14274//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14274//console This message is automatically generated. Delayed scanner close in KeyValueHeap and StoreScanner -- Key: HBASE-13827 URL: https://issues.apache.org/jira/browse/HBASE-13827 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13827.patch, HBASE-13827.patch This is to support the work in HBASE-12295. We have to return the blocks when the close() happens on the HFileScanner. Right now close is not at all there. Will add. The StoreFileScanner will call it on its close(). In KVHeap when we see one of the child scanner runs out of cells, we will remove them from the PriorityQueue as well as close it. Also the same kind of stuff in StoreScanner too. But when we want to do the return block in close() this kind of early close is not correct. Still there might be cells created out of these cached blocks. This Jira aims at changing these container scanners not to do early close. When it seems a child scanner no longer required, it will avoid using it completely but just wont call close(). Instead it will be added to another list for a delayed close and that will be closed when the container scanner close() happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13833) LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad
[ https://issues.apache.org/jira/browse/HBASE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572062#comment-14572062 ] Hadoop QA commented on HBASE-13833: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737429/HBASE-13833.02.branch-1.1.patch against branch-1.1 branch at commit e8e5a9f6398f5a99f1d89be359212a7a4f1d7b05. ATTACHMENT ID: 12737429 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3811 checkstyle errors (more than the master's current 3810 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14273//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14273//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14273//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14273//console This message is automatically generated. LoadIncrementalHFile.doBulkLoad(Path,HTable) doesn't handle unmanaged connections when using SecureBulkLoad --- Key: HBASE-13833 URL: https://issues.apache.org/jira/browse/HBASE-13833 Project: HBase Issue Type: Bug Affects Versions: 1.1.0.1 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 1.1.1 Attachments: HBASE-13833.00.branch-1.1.patch, HBASE-13833.01.branch-1.1.patch, HBASE-13833.02.branch-1.1.patch Seems HBASE-13328 wasn't quite sufficient. {noformat} 015-06-02 05:49:23,578|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/_SUCCESS 2015-06-02 05:49:23,720|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO hfile.CacheConfig: CacheConfig:disabled 2015-06-02 05:49:23,859|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:49:23 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://dal-pqc1:8020/tmp/192f21dd-cc89-4354-8ba1-78d1f228e7c7/LARGE_TABLE/0/00870fd0a7544373b32b6f1e976bf47f first=\x80\x00\x00\x00 last=\x80LK? 2015-06-02 05:50:32,028|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:32 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=68154 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 2015-06-02 05:50:52,128|beaver.machine|INFO|2828|7140|MainThread|15/06/02 05:50:52 INFO client.RpcRetryingCaller: Call exception, tries=11, retries=35, started=88255 ms ago, cancelled=false, msg=row '' on table 'LARGE_TABLE' at region=LARGE_TABLE,,1433222865285.e01e02483f30a060d3f7abb1846ea029., hostname=dal-pqc5,16020,1433222547221, seqNum=2 ... ... 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|15/06/02 05:01:56 ERROR mapreduce.CsvBulkLoadTool: Import job on table=LARGE_TABLE failed due to exception. 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|java.io.IOException: BulkLoad encountered an unrecoverable problem 2015-06-02 05:01:56,121|beaver.machine|INFO|7800|2276|MainThread|at
[jira] [Commented] (HBASE-13817) ByteBufferOuputStream - add writeInt support
[ https://issues.apache.org/jira/browse/HBASE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572061#comment-14572061 ] Hadoop QA commented on HBASE-13817: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737431/HBASE-13817_V3.patch against master branch at commit e8e5a9f6398f5a99f1d89be359212a7a4f1d7b05. ATTACHMENT ID: 12737431 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14275//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14275//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14275//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14275//console This message is automatically generated. ByteBufferOuputStream - add writeInt support Key: HBASE-13817 URL: https://issues.apache.org/jira/browse/HBASE-13817 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13817.patch, HBASE-13817_V2.patch, HBASE-13817_V3.patch, benchmark.zip While writing Cells to this stream, to make the CellBlock ByteBuffer, we do write length of the cell as int. We use StreamUtils to do this which will write each byte one after the other. So 4 write calls on Stream.(OutputSteam has only this support) With ByteBufferOuputStream we have the overhead of checking for size limit and possible grow with every write call. Internally this stream writes to a ByteBuffer. Again inside the ByteBuffer implementations there is position limit checks. If we do write these length as int in one go we can reduce this overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13817) ByteBufferOuputStream - add writeInt support
[ https://issues.apache.org/jira/browse/HBASE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13817: --- Attachment: HBASE-13817_V3.patch ByteBufferOuputStream - add writeInt support Key: HBASE-13817 URL: https://issues.apache.org/jira/browse/HBASE-13817 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13817.patch, HBASE-13817_V2.patch, HBASE-13817_V3.patch, benchmark.zip While writing Cells to this stream, to make the CellBlock ByteBuffer, we do write length of the cell as int. We use StreamUtils to do this which will write each byte one after the other. So 4 write calls on Stream.(OutputSteam has only this support) With ByteBufferOuputStream we have the overhead of checking for size limit and possible grow with every write call. Internally this stream writes to a ByteBuffer. Again inside the ByteBuffer implementations there is position limit checks. If we do write these length as int in one go we can reduce this overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13817) ByteBufferOuputStream - add writeInt support
[ https://issues.apache.org/jira/browse/HBASE-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13817: --- Hadoop Flags: Reviewed Status: Patch Available (was: Open) ByteBufferOuputStream - add writeInt support Key: HBASE-13817 URL: https://issues.apache.org/jira/browse/HBASE-13817 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13817.patch, HBASE-13817_V2.patch, HBASE-13817_V3.patch, benchmark.zip While writing Cells to this stream, to make the CellBlock ByteBuffer, we do write length of the cell as int. We use StreamUtils to do this which will write each byte one after the other. So 4 write calls on Stream.(OutputSteam has only this support) With ByteBufferOuputStream we have the overhead of checking for size limit and possible grow with every write call. Internally this stream writes to a ByteBuffer. Again inside the ByteBuffer implementations there is position limit checks. If we do write these length as int in one go we can reduce this overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11520) Simplify offheap cache config by removing the confusing hbase.bucketcache.percentage.in.combinedcache
[ https://issues.apache.org/jira/browse/HBASE-11520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571988#comment-14571988 ] Hudson commented on HBASE-11520: SUCCESS: Integrated in Ambari-trunk-Commit #2804 (See [https://builds.apache.org/job/Ambari-trunk-Commit/2804/]) AMBARI-11552. 2.3 stack advisor doesn't take into account HBASE-11520 (Nick Dimiduk via srimanth) (sgunturi: http://git-wip-us.apache.org/repos/asf?p=ambari.gita=commith=aeccbc7fe458509241e16c47f653f65a6ed8c2e4) * ambari-server/src/test/python/stacks/2.3/common/test_stack_advisor.py * ambari-server/src/main/resources/stacks/HDP/2.2/services/stack_advisor.py * ambari-server/src/main/resources/stacks/HDP/2.3/services/stack_advisor.py * ambari-server/src/test/python/stacks/2.2/common/test_stack_advisor.py Simplify offheap cache config by removing the confusing hbase.bucketcache.percentage.in.combinedcache --- Key: HBASE-11520 URL: https://issues.apache.org/jira/browse/HBASE-11520 Project: HBase Issue Type: Sub-task Components: io Affects Versions: 0.99.0 Reporter: stack Assignee: stack Fix For: 0.99.0, 2.0.0 Attachments: 11520.txt, 11520v2.txt, 11520v3.txt, 11520v3.txt Remove hbase.bucketcache.percentage.in.combinedcache. It is unnecessary complication of block cache config. Let L1 config setup be as it is whether a L2 present or not, just set hfile.block.cache.size (not hbase.bucketcache.size * (1.0 - hbase.bucketcache.percentage.in.combinedcache)). For L2, let hbase.bucketcache.size be the actual size of the bucket cache, not hbase.bucketcache.size * hbase.bucketcache.percentage.in.combinedcache. Attached patch removes the config. and updates docs. Adds tests to confirm configs are as expected whether a CombinedBlockCache deploy or a strict L1+L2 deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV
[ https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572093#comment-14572093 ] Hadoop QA commented on HBASE-13835: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12737437/HBASE-13835-001.patch against master branch at commit e8e5a9f6398f5a99f1d89be359212a7a4f1d7b05. ATTACHMENT ID: 12737437 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14277//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14277//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14277//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14277//console This message is automatically generated. KeyValueHeap.current might be in heap when exception happens in pollRealKV -- Key: HBASE-13835 URL: https://issues.apache.org/jira/browse/HBASE-13835 Project: HBase Issue Type: Bug Components: Scanners Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HBASE-13835-001.patch In a 0.94 hbase cluster, we found a NPE with following stack: {code} Exception in thread regionserver21600.leaseChecker java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241) at org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302) at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033) at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119) at java.lang.Thread.run(Thread.java:662) {code} Before this NPE exception, there is an exception happens in pollRealKV, which we think is the culprit of the NPE. {code} ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader= at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366) at
[jira] [Commented] (HBASE-13831) TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+
[ https://issues.apache.org/jira/browse/HBASE-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571939#comment-14571939 ] Hudson commented on HBASE-13831: FAILURE: Integrated in HBase-TRUNK #6544 (See [https://builds.apache.org/job/HBase-TRUNK/6544/]) HBASE-13831 TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+ (Stephen Jiang) (tedyu: rev e8e5a9f6398f5a99f1d89be359212a7a4f1d7b05) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java TestHBaseFsck#testParallelHbck is flaky against hadoop 2.6+ --- Key: HBASE-13831 URL: https://issues.apache.org/jira/browse/HBASE-13831 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Priority: Minor Fix For: 2.0.0, 1.2.0, 1.1.1 Attachments: HBASE-13831.patch Running TestHBaseFsck#testParallelHbck is flaky against HADOOP-2.6+ environment. The idea of the test is that with when 2 HBCK operations are running simultaneously, the 2nd HBCK would fail with no-retry because creating lock file would fail due to the 1st HBCK already created. However, with HADOOP-2.6+, the FileSystem#createFile call internally retries with AlreadyBeingCreatedException (see HBASE-13574 for more details: It seems that test is broken due of the new create retry policy in hadoop 2.6. Namenode proxy now created with custom RetryPolicy for AlreadyBeingCreatedException which is implies timeout on this operations up to HdfsConstants.LEASE_SOFTLIMIT_PERIOD (60seconds).) When I run the TestHBaseFsck#testParallelHbck test against HADOOP-2.7 in a Windows environment (HBASE is branch-1.1) multiple times, the result is unpredictable (sometime succeeded, sometime failed - more failure than succeeded). The fix is trivial: Leverage the change in HBASE-13732 and reduce the max wait time to a smaller number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)