[jira] [Commented] (HBASE-12346) Scan's default auths behavior under Visibility labels
[ https://issues.apache.org/jira/browse/HBASE-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192978#comment-14192978 ] Hadoop QA commented on HBASE-12346: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678663/HBASE-12346-master-v3.patch against trunk revision . ATTACHMENT ID: 12678663 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11552//console This message is automatically generated. Scan's default auths behavior under Visibility labels - Key: HBASE-12346 URL: https://issues.apache.org/jira/browse/HBASE-12346 Project: HBase Issue Type: Bug Components: API, security Affects Versions: 0.98.7, 0.99.1 Reporter: Jerry He Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12346-master-v2.patch, HBASE-12346-master-v3.patch, HBASE-12346-master.patch In Visibility Labels security, a set of labels (auths) are administered and associated with a user. A user can normally only see cell data during scan that are part of the user's label set (auths). Scan uses setAuthorizations to indicates its wants to use the auths to access the cells. Similarly in the shell: {code} scan 'table1', AUTHORIZATIONS = ['private'] {code} But it is a surprise to find that setAuthorizations seems to be 'mandatory' in the default visibility label security setting. Every scan needs to setAuthorizations before the scan can get any cells even the cells are under the labels the request user is part of. The following steps will illustrate the issue: Run as superuser. {code} 1. create a visibility label called 'private' 2. create 'table1' 3. put into 'table1' data and label the data as 'private' 4. set_auths 'user1', 'private' 5. grant 'user1', 'RW', 'table1' {code} Run as
[jira] [Commented] (HBASE-12406) Bulk load fails in 0.98 against hadoop-1 due to unmatched family name
[ https://issues.apache.org/jira/browse/HBASE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192983#comment-14192983 ] Hadoop QA commented on HBASE-12406: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678664/12406-0.98-v1.txt against trunk revision . ATTACHMENT ID: 12678664 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11553//console This message is automatically generated. Bulk load fails in 0.98 against hadoop-1 due to unmatched family name - Key: HBASE-12406 URL: https://issues.apache.org/jira/browse/HBASE-12406 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.98.8 Attachments: 12406-0.98-v1.txt From https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/614/testReport/org.apache.hadoop.hbase.mapreduce/TestCopyTable/testCopyTableWithBulkload/ : {code} java.io.IOException: Unmatched family names found: unmatched family names in HFiles to be bulkloaded: [_logs]; valid family names of table testCopyTable2 are: [family] at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:268) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:907) at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:344) {code} The above failure was due to the presence of history directory under _logs directory. e.g. {code} hdfs://nn:59313/user/tyu/copytable/4282249372082687850/_logs/history {code} HBASE-12375 removed check for directory name which starts with
[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12363: -- Attachment: 12363-master.txt Here's a patch. * Adds new TTL option to KEEP_DELETED_CELLS * 100% backwards compatible in HColumnDescriptor (can parse the old 'true', 'false' string) * 100% compatible in shell (arg.to_s.upcase to boolean and strings will work exactly as before) * the only difference is that a newly created table will show 'TRUE' instead 'true', even that is compatible forward compatible for old case, as the old code will try to parse it as Boolean * added tests Now, ScanQueryMatcher doesn't exactly look nicer now. If somebody suggests some easy simplifications here I'm happy to incorporate them. It's think it's time to refactor it... For another jira. TL;DR: with KEEP_DELETED_CELLS=TTL deleted cells *and* their delete markers are removed when the TTL expired (regardless of MIN_VERSION setting). I.e. one can keep TTL + MIN_VERSIONS and still get rid of old deleted rows. We could even add another enum: MAKERS_ONLY and remove the hbase.hstore.time.to.purge.deletes config option, but that's also another jira. KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and then append it to the file metadata. That way HFileWriterV* would not need to know about KVs. And I implemented #4. I'd love to get input on ideas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192990#comment-14192990 ] Lars Hofhansl edited comment on HBASE-12363 at 11/1/14 6:38 AM: Here's a patch. * Adds new TTL option to KEEP_DELETED_CELLS * 100% backwards compatible in HColumnDescriptor (can parse the old 'true', 'false' string) * 100% compatible in shell (arg.to_s.upcase so boolean and strings will work exactly as before) * the only difference is that a newly created table will show 'TRUE' instead of 'true'; even that is compatible forward compatible for old case, as the old code will try to parse it as Boolean * added tests Now, ScanQueryMatcher doesn't exactly look nicer now. If somebody suggests some easy simplifications here I'm happy to incorporate them. It's think it's time to refactor it... For another jira. TL;DR: with KEEP_DELETED_CELLS=TTL deleted cells *and* their delete markers are removed when the TTL expired (regardless of MIN_VERSION setting). I.e. one can keep TTL + MIN_VERSIONS and still get rid of old deleted rows. We could even add another enum: MAKERS_ONLY and remove the hbase.hstore.time.to.purge.deletes config option, but that's also another jira. was (Author: lhofhansl): Here's a patch. * Adds new TTL option to KEEP_DELETED_CELLS * 100% backwards compatible in HColumnDescriptor (can parse the old 'true', 'false' string) * 100% compatible in shell (arg.to_s.upcase to boolean and strings will work exactly as before) * the only difference is that a newly created table will show 'TRUE' instead 'true', even that is compatible forward compatible for old case, as the old code will try to parse it as Boolean * added tests Now, ScanQueryMatcher doesn't exactly look nicer now. If somebody suggests some easy simplifications here I'm happy to incorporate them. It's think it's time to refactor it... For another jira. TL;DR: with KEEP_DELETED_CELLS=TTL deleted cells *and* their delete markers are removed when the TTL expired (regardless of MIN_VERSION setting). I.e. one can keep TTL + MIN_VERSIONS and still get rid of old deleted rows. We could even add another enum: MAKERS_ONLY and remove the hbase.hstore.time.to.purge.deletes config option, but that's also another jira. KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and
[jira] [Assigned] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HBASE-12363: - Assignee: Lars Hofhansl KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and then append it to the file metadata. That way HFileWriterV* would not need to know about KVs. And I implemented #4. I'd love to get input on ideas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12363: -- Status: Patch Available (was: Open) KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and then append it to the file metadata. That way HFileWriterV* would not need to know about KVs. And I implemented #4. I'd love to get input on ideas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192991#comment-14192991 ] Hadoop QA commented on HBASE-12363: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678669/12363-master.txt against trunk revision . ATTACHMENT ID: 12678669 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 27 new or modified tests. {color:red}-1 javac{color}. The patch appears to cause mvn compile goal to fail. Compilation errors resume: [ERROR] COMPILATION ERROR : [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,23] cannot find symbol [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[429,36] cannot find symbol [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[794,10] cannot find symbol [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[819,48] cannot find symbol [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,63] cannot find symbol [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[798,14] cannot find symbol [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,61] cannot find symbol [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,85] cannot find symbol [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project hbase-client: Compilation failure: Compilation failure: [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,23] cannot find symbol [ERROR] symbol: class KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[429,36] cannot find symbol [ERROR] symbol: class KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[794,10] cannot find symbol [ERROR] symbol: class KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[819,48] cannot find symbol [ERROR] symbol: class KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[162,63] cannot find symbol [ERROR] symbol: variable KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[798,14] cannot find symbol [ERROR] symbol: variable KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,61] cannot find symbol [ERROR] symbol: variable KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:[811,85] cannot find symbol [ERROR] symbol: variable KeepDeletedCells [ERROR] location: class org.apache.hadoop.hbase.HColumnDescriptor [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR]
[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12363: -- Attachment: (was: 12363-master.txt) KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and then append it to the file metadata. That way HFileWriterV* would not need to know about KVs. And I implemented #4. I'd love to get input on ideas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12363: -- Attachment: 12363-master.txt Whoops... Correct version this time. KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and then append it to the file metadata. That way HFileWriterV* would not need to know about KVs. And I implemented #4. I'd love to get input on ideas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193026#comment-14193026 ] Hadoop QA commented on HBASE-12363: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678670/12363-master.txt against trunk revision . ATTACHMENT ID: 12678670 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 27 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3784 checkstyle errors (more than the trunk's current 3781 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +return setValue(KEEP_DELETED_CELLS, (keepDeletedCells ? KeepDeletedCells.TRUE : KeepDeletedCells.FALSE).toString()); +this.keepDeletedCells = scan.isRaw() ? KeepDeletedCells.TRUE : isUserScan ? KeepDeletedCells.FALSE : scanInfo.getKeepDeletedCells(); +this.seePastDeleteMarkers = scanInfo.getKeepDeletedCells() != KeepDeletedCells.FALSE isUserScan; +ScanInfo scanInfo = new ScanInfo(null, 0, 1, HConstants.LATEST_TIMESTAMP, KeepDeletedCells.FALSE, + family.setKeepDeletedCells(org.apache.hadoop.hbase.KeepDeletedCells.valueOf(arg.delete(org.apache.hadoop.hbase.HColumnDescriptor::KEEP_DELETED_CELLS).to_s.upcase)) if arg.include?(org.apache.hadoop.hbase.HColumnDescriptor::KEEP_DELETED_CELLS) {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/patchReleaseAuditWarnings.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11555//console This message is automatically generated. KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how
[jira] [Reopened] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091
[ https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dima Spivak reopened HBASE-12285: - Lots of failing builds recently with {{Stream Closed}} being replaced with {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18-SNAPSHOT:test (secondPartTestsExecution) on project hbase-server: There was a timeout or other error in the fork - [Help 1] {code} since we switched to Surefire 2.18-SNAPSHOT. I'm also still bothered by not being able to answer [~stack]'s question of why this was only hitting branch-1 (even when using the known-faulty 2.17 version), so I'm reopening this. Builds are failing, possibly because of SUREFIRE-1091 - Key: HBASE-12285 URL: https://issues.apache.org/jira/browse/HBASE-12285 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dima Spivak Assignee: Dima Spivak Priority: Blocker Fix For: 2.0.0, 0.99.2 Attachments: HBASE-12285_branch-1_v1.patch, HBASE-12285_branch-1_v1.patch Our branch-1 builds on builds.apache.org have been failing in recent days after we switched over to an official version of Surefire a few days back (HBASE-4955). The version we're using, 2.17, is hit by a bug ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results in an IOException, which looks like what we're seeing on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12405) WAL accounting by Store
[ https://issues.apache.org/jira/browse/HBASE-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangduo updated HBASE-12405: - Description: HBASE-10201 has made flush decisions per Store, but has not done enough work on HLog, so there are two problems: 1. We record minSeqId both in HRegion and FSHLog, which is a duplication. 2. There maybe holes in WAL accounting. For example, assume family A with sequence id 1 and 3, family B with seqId 2. If we flush family A, we can only record that WAL before sequence id 1 can be removed safely. If we do a replay at this point, sequence id 3 will also be replayed which is unnecessary. was: HBASE-10201 has made flush decisions per Store, but has not done enough work on HLog, so there are two problems: 1. We record minSeqId both in HRegion and FSHLog, which is a duplication. 2. There maybe holes in WAL accounting. For example, assume family A with sequence id 1 and 3, family B with seqId 2. If we flush family A, we can only record that WAL before sequence id 1 can be removed safely. If we do a replay at this point, sequence id 4 will also be replayed which is unnecessary. WAL accounting by Store --- Key: HBASE-12405 URL: https://issues.apache.org/jira/browse/HBASE-12405 Project: HBase Issue Type: Improvement Components: wal Reporter: zhangduo Assignee: zhangduo HBASE-10201 has made flush decisions per Store, but has not done enough work on HLog, so there are two problems: 1. We record minSeqId both in HRegion and FSHLog, which is a duplication. 2. There maybe holes in WAL accounting. For example, assume family A with sequence id 1 and 3, family B with seqId 2. If we flush family A, we can only record that WAL before sequence id 1 can be removed safely. If we do a replay at this point, sequence id 3 will also be replayed which is unnecessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12393) The regionserver web will throw exception if we disable block cache
[ https://issues.apache.org/jira/browse/HBASE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChiaPing Tsai updated HBASE-12393: -- Labels: patch (was: ) Status: Patch Available (was: Open) To avoid invoking disabled blockcache's method, we use an additional statement(else if) to evaluate the value of blockcache. If blockcache is null, it will display Block Cache is disabled on the web page of blockcache stats. The regionserver web will throw exception if we disable block cache --- Key: HBASE-12393 URL: https://issues.apache.org/jira/browse/HBASE-12393 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.7 Environment: ubuntu 12.04 64bits, hadoop-2.2.0, hbase-0.98.7-hadoop2 Reporter: ChiaPing Tsai Priority: Minor Labels: patch Attachments: HBASE-12393.patch The CacheConfig.getBlockCache() will return the null point when we set hfile.block.cache.size to zero. It caused the BlockCacheTmplImpl.java:123 to throw null exception. {code} org.jamon.escaping.Escaping.HTML.write(org.jamon.emit.StandardEmitter.valueOf(StringUtils.humanReadableInt(cacheConfig.getBlockCache().size())), jamonWriter); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12393) The regionserver web will throw exception if we disable block cache
[ https://issues.apache.org/jira/browse/HBASE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193136#comment-14193136 ] Hadoop QA commented on HBASE-12393: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678584/HBASE-12393.patch against trunk revision . ATTACHMENT ID: 12678584 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3782 checkstyle errors (more than the trunk's current 3781 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11556//console This message is automatically generated. The regionserver web will throw exception if we disable block cache --- Key: HBASE-12393 URL: https://issues.apache.org/jira/browse/HBASE-12393 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.7 Environment: ubuntu 12.04 64bits, hadoop-2.2.0, hbase-0.98.7-hadoop2 Reporter: ChiaPing Tsai Priority: Minor Labels: patch Attachments: HBASE-12393.patch The CacheConfig.getBlockCache() will return the null point when we set hfile.block.cache.size to zero. It caused the BlockCacheTmplImpl.java:123 to throw null exception. {code} org.jamon.escaping.Escaping.HTML.write(org.jamon.emit.StandardEmitter.valueOf(StringUtils.humanReadableInt(cacheConfig.getBlockCache().size())), jamonWriter); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12406) Bulk load fails in 0.98 against hadoop-1 due to unmatched family name
[ https://issues.apache.org/jira/browse/HBASE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193181#comment-14193181 ] Anoop Sam John commented on HBASE-12406: Any other such 'to be excluded' dirs? Ping [~ashish singhi] Bulk load fails in 0.98 against hadoop-1 due to unmatched family name - Key: HBASE-12406 URL: https://issues.apache.org/jira/browse/HBASE-12406 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.98.8 Attachments: 12406-0.98-v1.txt From https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/614/testReport/org.apache.hadoop.hbase.mapreduce/TestCopyTable/testCopyTableWithBulkload/ : {code} java.io.IOException: Unmatched family names found: unmatched family names in HFiles to be bulkloaded: [_logs]; valid family names of table testCopyTable2 are: [family] at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:268) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:907) at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:344) {code} The above failure was due to the presence of history directory under _logs directory. e.g. {code} hdfs://nn:59313/user/tyu/copytable/4282249372082687850/_logs/history {code} HBASE-12375 removed check for directory name which starts with underscore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12393) The regionserver web will throw exception if we disable block cache
[ https://issues.apache.org/jira/browse/HBASE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193208#comment-14193208 ] ChiaPing Tsai commented on HBASE-12393: --- {quote} -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {quote} Not added any UT as it was only message change on UI. The manual steps are shown below: # set hfile.block.cache.size to zero. # open the RegionServer UI and there are no nullpointexception anymore. # click on Stats of Block Cache and the message Block Cache is disabled will appear {quote} -1 checkstyle. The applied patch generated 3782 checkstyle errors (more than the trunk's current 3781 errors). {quote} The BlockCacheTmplImpl.java is the auto-generated Jamon implementation. The white space error is due to the code style of Jamon. The regionserver web will throw exception if we disable block cache --- Key: HBASE-12393 URL: https://issues.apache.org/jira/browse/HBASE-12393 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.7 Environment: ubuntu 12.04 64bits, hadoop-2.2.0, hbase-0.98.7-hadoop2 Reporter: ChiaPing Tsai Priority: Minor Labels: patch Attachments: HBASE-12393.patch The CacheConfig.getBlockCache() will return the null point when we set hfile.block.cache.size to zero. It caused the BlockCacheTmplImpl.java:123 to throw null exception. {code} org.jamon.escaping.Escaping.HTML.write(org.jamon.emit.StandardEmitter.valueOf(StringUtils.humanReadableInt(cacheConfig.getBlockCache().size())), jamonWriter); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-12406) Bulk load fails in 0.98 against hadoop-1 due to unmatched family name
[ https://issues.apache.org/jira/browse/HBASE-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-12406: -- Assignee: Ted Yu Bulk load fails in 0.98 against hadoop-1 due to unmatched family name - Key: HBASE-12406 URL: https://issues.apache.org/jira/browse/HBASE-12406 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.8 Attachments: 12406-0.98-v1.txt From https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/614/testReport/org.apache.hadoop.hbase.mapreduce/TestCopyTable/testCopyTableWithBulkload/ : {code} java.io.IOException: Unmatched family names found: unmatched family names in HFiles to be bulkloaded: [_logs]; valid family names of table testCopyTable2 are: [family] at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:268) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:907) at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:344) {code} The above failure was due to the presence of history directory under _logs directory. e.g. {code} hdfs://nn:59313/user/tyu/copytable/4282249372082687850/_logs/history {code} HBASE-12375 removed check for directory name which starts with underscore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193213#comment-14193213 ] Ted Yu commented on HBASE-12363: What if a table with KEEP_DELETED_CELLS set to TTL is exported to a cluster which is running an older release ? Would the exported table be parsed correctly ? KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and then append it to the file metadata. That way HFileWriterV* would not need to know about KVs. And I implemented #4. I'd love to get input on ideas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout
[ https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-12403: - Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to 0.98+. Thanks folks. IntegrationTestMTTR flaky due to aggressive RS restart timeout -- Key: HBASE-12403 URL: https://issues.apache.org/jira/browse/HBASE-12403 Project: HBase Issue Type: Test Components: integration tests Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12403.00.patch TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS must connect to the Master before it can be online, this is not long enough time in an environment where the Master can also be killed. Failure from the console says the test failed because a RestartRsHoldingMetaAction timed out. {noformat} Caused by: java.io.IOException: did timeout waiting for region server to start:ip-172-31-42-248.ec2.internal at org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153) at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93) at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52) at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This is only reported at the end of the test run. There's no indication as to when during the test run this failure happened. The timeout on the start RS operation is 60 seconds. Hacking out the start/stop messages from the logs during the time window when this test ran, it appears that at one point the RS took 2min 12s between when it was launched and when it reported for duty {noformat} Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248 2014-10-31 14:55:29,049 INFO [regionserver60020] regionserver.HRegionServer: Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118 {noformat} The RS came up without incident. It spent 1min 4s of that time waiting on the master to start, attempted to report for duty from 14:54:28 to 14:55:24. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193306#comment-14193306 ] Ted Yu commented on HBASE-12394: Mind putting patch on reviewboard ? hbase.mapreduce.scan.regionspermapper controls how many mappers would be used. Have you considered specifying number of mappers for this feature ? Thanks Support multiple regions as input to each mapper in map/reduce jobs --- Key: HBASE-12394 URL: https://issues.apache.org/jira/browse/HBASE-12394 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 2.0.0, 0.98.6.1 Reporter: Weichen Ye Attachments: HBASE-12394.patch For Hadoop cluster, a job with large HBase table as input always consumes a large amount of computing resources. For example, we need to create a job with 1000 mappers to scan a table with 1000 regions. This patch is to support one mapper using multiple regions as input. The following new files are included in this patch: TableMultiRegionInputFormat.java TableMultiRegionInputFormatBase.java TableMultiRegionMapReduceUtil.java *TestTableMultiRegionInputFormatScan1.java *TestTableMultiRegionInputFormatScan2.java *TestTableMultiRegionInputFormatScanBase.java *TestTableMultiRegionMapReduceUtil.java The files start with * are tests. In order to support multiple regions for one mapper, we need a new property in configuration--hbase.mapreduce.scan.regionspermapper This is an example,which means each mapper has 3 regions as input. property namehbase.mapreduce.scan.regionspermapper/name value3/value /property This is an example for Java code: TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, Text.class, job); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12399) Master startup race between metrics and RpcServer
[ https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-12399: - Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to 0.98+ Master startup race between metrics and RpcServer - Key: HBASE-12399 URL: https://issues.apache.org/jira/browse/HBASE-12399 Project: HBase Issue Type: Bug Components: master Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12399.patch, HBASE-12399.00.patch Seeing this on CM tests with frequent master thrashing {noformat} 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12402) ZKPermissionWatcher race condition in refreshing the cache leaving stale ACLs and causing AccessDenied
[ https://issues.apache.org/jira/browse/HBASE-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193372#comment-14193372 ] Enis Soztutar commented on HBASE-12402: --- I have run IntegrationTestIngest with CM on a cluster of 4 nodes 10 times to test the change. It seem good to go. ZKPermissionWatcher race condition in refreshing the cache leaving stale ACLs and causing AccessDenied -- Key: HBASE-12402 URL: https://issues.apache.org/jira/browse/HBASE-12402 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: hbase-12402_v1.patch In testing, we have seen an issue where a region in a newly created table will throw AccessDeniedException. There seems to be a race condition in the ZKPermissionWatcher when it is just starting up, and a new table is created around the same time. The master just created the table, and adds permissions to acl table: {code} 2014-10-30 19:21:26,494 DEBUG [MASTER_TABLE_OPERATIONS-ip-172-31-32-87:6-0] access.AccessControlLists: Writing permission with rowKey loadtest_d1 hrt_qa: RWXCA {code} One of the region servers is just starting: {code} Thu Oct 30 19:21:11 UTC 2014 Starting regionserver on ip-172-31-32-90 2014-10-30 19:21:13,915 INFO [main] util.VersionInfo: HBase 0.98.4.2.2.0.0-1194-hadoop2 {code} The node creation event is received {code} 2014-10-30 19:21:26,764 DEBUG [regionserver60020-EventThread] access.ZKPermissionWatcher: Updating permissions cache from node loadtest_d1 with data: PBUF\x0A0\x0A\x06hrt_qa\x12\x08\x03\x0A\x16\x0A\x07default\x12\x0Bloadtest_d1 \x00 \x01 \x02 \x03 \x04 {code} which put the write data to the cache, only to be invalidated later shortly: {code} ... 2014-10-30 19:21:26,855 DEBUG [RS_OPEN_REGION-ip-172-31-32-90:60020-1] access.ZKPermissionWatcher: Updating permissions cache from node tabletwo_copytable_cell_versions_two with data: PBUF\x0AI\x0A\x06hrt_qa\x12?\x08\x03;\x0A/\x0A\x07default\x12$tabletwo_copytable_cell_versions_two \x00 \x01 \x02 \x03 \x04 2014-10-30 19:21:26,856 DEBUG [RS_OPEN_REGION-ip-172-31-32-90:60020-1] access.ZKPermissionWatcher: Updating permissions cache from node loadtest_d1 with data: PBUF 2014-10-30 19:21:26,856 DEBUG [RS_OPEN_REGION-ip-172-31-32-90:60020-1] access.ZKPermissionWatcher: Updating permissions cache from node tablefour_cell_version_snapshots_copy with data: PBUF ... {code} Notice that the threads are different. The first one is the zk event notification thread, vs the other is the thread from OpenRegionHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer
[ https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193376#comment-14193376 ] Hudson commented on HBASE-12399: SUCCESS: Integrated in HBase-TRUNK #5735 (See [https://builds.apache.org/job/HBase-TRUNK/5735/]) HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev b5764a8e74179bfc0c09a416d51271116b903c2c) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java Master startup race between metrics and RpcServer - Key: HBASE-12399 URL: https://issues.apache.org/jira/browse/HBASE-12399 Project: HBase Issue Type: Bug Components: master Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12399.patch, HBASE-12399.00.patch Seeing this on CM tests with frequent master thrashing {noformat} 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout
[ https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193375#comment-14193375 ] Hudson commented on HBASE-12403: SUCCESS: Integrated in HBase-TRUNK #5735 (See [https://builds.apache.org/job/HBase-TRUNK/5735/]) HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout (ndimiduk: rev 3c06b48181e22eb4ce91d6d8a455a1617f13d85f) * hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java IntegrationTestMTTR flaky due to aggressive RS restart timeout -- Key: HBASE-12403 URL: https://issues.apache.org/jira/browse/HBASE-12403 Project: HBase Issue Type: Test Components: integration tests Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12403.00.patch TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS must connect to the Master before it can be online, this is not long enough time in an environment where the Master can also be killed. Failure from the console says the test failed because a RestartRsHoldingMetaAction timed out. {noformat} Caused by: java.io.IOException: did timeout waiting for region server to start:ip-172-31-42-248.ec2.internal at org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153) at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93) at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52) at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This is only reported at the end of the test run. There's no indication as to when during the test run this failure happened. The timeout on the start RS operation is 60 seconds. Hacking out the start/stop messages from the logs during the time window when this test ran, it appears that at one point the RS took 2min 12s between when it was launched and when it reported for duty {noformat} Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248 2014-10-31 14:55:29,049 INFO [regionserver60020] regionserver.HRegionServer: Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118 {noformat} The RS came up without incident. It spent 1min 4s of that time waiting on the master to start, attempted to report for duty from 14:54:28 to 14:55:24. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer
[ https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193408#comment-14193408 ] Hudson commented on HBASE-12399: FAILURE: Integrated in HBase-1.0 #405 (See [https://builds.apache.org/job/HBase-1.0/405/]) HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev c3a7f2f3bbb2a12bfffeff6d181e619a1545c41a) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java Master startup race between metrics and RpcServer - Key: HBASE-12399 URL: https://issues.apache.org/jira/browse/HBASE-12399 Project: HBase Issue Type: Bug Components: master Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12399.patch, HBASE-12399.00.patch Seeing this on CM tests with frequent master thrashing {noformat} 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout
[ https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193407#comment-14193407 ] Hudson commented on HBASE-12403: FAILURE: Integrated in HBase-1.0 #405 (See [https://builds.apache.org/job/HBase-1.0/405/]) HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout (ndimiduk: rev 687710eb2869817952461796d04e35de29a98fdb) * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java * hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java IntegrationTestMTTR flaky due to aggressive RS restart timeout -- Key: HBASE-12403 URL: https://issues.apache.org/jira/browse/HBASE-12403 Project: HBase Issue Type: Test Components: integration tests Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12403.00.patch TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS must connect to the Master before it can be online, this is not long enough time in an environment where the Master can also be killed. Failure from the console says the test failed because a RestartRsHoldingMetaAction timed out. {noformat} Caused by: java.io.IOException: did timeout waiting for region server to start:ip-172-31-42-248.ec2.internal at org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153) at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93) at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52) at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This is only reported at the end of the test run. There's no indication as to when during the test run this failure happened. The timeout on the start RS operation is 60 seconds. Hacking out the start/stop messages from the logs during the time window when this test ran, it appears that at one point the RS took 2min 12s between when it was launched and when it reported for duty {noformat} Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248 2014-10-31 14:55:29,049 INFO [regionserver60020] regionserver.HRegionServer: Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118 {noformat} The RS came up without incident. It spent 1min 4s of that time waiting on the master to start, attempted to report for duty from 14:54:28 to 14:55:24. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer
[ https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193437#comment-14193437 ] Hudson commented on HBASE-12399: FAILURE: Integrated in HBase-0.98 #647 (See [https://builds.apache.org/job/HBase-0.98/647/]) HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev da145ae2da11d0b59f47ca78bb26c166a84bf386) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java Master startup race between metrics and RpcServer - Key: HBASE-12399 URL: https://issues.apache.org/jira/browse/HBASE-12399 Project: HBase Issue Type: Bug Components: master Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12399.patch, HBASE-12399.00.patch Seeing this on CM tests with frequent master thrashing {noformat} 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout
[ https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193436#comment-14193436 ] Hudson commented on HBASE-12403: FAILURE: Integrated in HBase-0.98 #647 (See [https://builds.apache.org/job/HBase-0.98/647/]) HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout (ndimiduk: rev 414bed7197097db4e2ce638f46d9996fdfb305b1) * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java * hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java IntegrationTestMTTR flaky due to aggressive RS restart timeout -- Key: HBASE-12403 URL: https://issues.apache.org/jira/browse/HBASE-12403 Project: HBase Issue Type: Test Components: integration tests Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12403.00.patch TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS must connect to the Master before it can be online, this is not long enough time in an environment where the Master can also be killed. Failure from the console says the test failed because a RestartRsHoldingMetaAction timed out. {noformat} Caused by: java.io.IOException: did timeout waiting for region server to start:ip-172-31-42-248.ec2.internal at org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153) at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93) at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52) at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This is only reported at the end of the test run. There's no indication as to when during the test run this failure happened. The timeout on the start RS operation is 60 seconds. Hacking out the start/stop messages from the logs during the time window when this test ran, it appears that at one point the RS took 2min 12s between when it was launched and when it reported for duty {noformat} Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248 2014-10-31 14:55:29,049 INFO [regionserver60020] regionserver.HRegionServer: Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118 {noformat} The RS came up without incident. It spent 1min 4s of that time waiting on the master to start, attempted to report for duty from 14:54:28 to 14:55:24. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12403) IntegrationTestMTTR flaky due to aggressive RS restart timeout
[ https://issues.apache.org/jira/browse/HBASE-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193450#comment-14193450 ] Hudson commented on HBASE-12403: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #615 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/615/]) HBASE-12403 IntegrationTestMTTR flaky due to aggressive RS restart timeout (ndimiduk: rev 414bed7197097db4e2ce638f46d9996fdfb305b1) * hbase-it/src/test/java/org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java * hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java IntegrationTestMTTR flaky due to aggressive RS restart timeout -- Key: HBASE-12403 URL: https://issues.apache.org/jira/browse/HBASE-12403 Project: HBase Issue Type: Test Components: integration tests Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12403.00.patch TL;DR: the CM RestartRS action timeout is only 60 seconds. Considering the RS must connect to the Master before it can be online, this is not long enough time in an environment where the Master can also be killed. Failure from the console says the test failed because a RestartRsHoldingMetaAction timed out. {noformat} Caused by: java.io.IOException: did timeout waiting for region server to start:ip-172-31-42-248.ec2.internal at org.apache.hadoop.hbase.HBaseCluster.waitForRegionServerToStart(HBaseCluster.java:153) at org.apache.hadoop.hbase.chaos.actions.Action.startRs(Action.java:93) at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:52) at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:559) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:550) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This is only reported at the end of the test run. There's no indication as to when during the test run this failure happened. The timeout on the start RS operation is 60 seconds. Hacking out the start/stop messages from the logs during the time window when this test ran, it appears that at one point the RS took 2min 12s between when it was launched and when it reported for duty {noformat} Fri Oct 31 14:53:17 UTC 2014 Starting regionserver on ip-172-31-42-248 2014-10-31 14:55:29,049 INFO [regionserver60020] regionserver.HRegionServer: Serving as ip-172-31-42-248.ec2.internal,60020,1414767238992, RpcServer on ip-172-31-42-248.ec2.internal/172.31.42.248:60020, sessionid=0x249661c2b7b0118 {noformat} The RS came up without incident. It spent 1min 4s of that time waiting on the master to start, attempted to report for duty from 14:54:28 to 14:55:24. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12399) Master startup race between metrics and RpcServer
[ https://issues.apache.org/jira/browse/HBASE-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193451#comment-14193451 ] Hudson commented on HBASE-12399: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #615 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/615/]) HBASE-12399 Master startup race between metrics and RpcServer (ndimiduk: rev da145ae2da11d0b59f47ca78bb26c166a84bf386) * hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/MetricsHBaseServerWrapperImpl.java Master startup race between metrics and RpcServer - Key: HBASE-12399 URL: https://issues.apache.org/jira/browse/HBASE-12399 Project: HBase Issue Type: Bug Components: master Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 12399.patch, HBASE-12399.00.patch Seeing this on CM tests with frequent master thrashing {noformat} 2014-10-31 12:01:59,196 ERROR [Timer for 'HBase' metrics system] impl.MetricsSourceAdapter: Error getting metrics from source IPC,sub=IPC java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.FifoRpcScheduler.getGeneralQueueLength(FifoRpcScheduler.java:81) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerWrapperImpl.getGeneralQueueLength(MetricsHBaseServerWrapperImpl.java:43) at org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.getMetrics(MetricsHBaseServerSourceImpl.java:117) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:382) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:369) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12363) KEEP_DELETED_CELLS considered harmful?
[ https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193510#comment-14193510 ] Lars Hofhansl commented on HBASE-12363: --- Obviously that is not going to work. The old code would interpret that as not true (i.e. false) and have KEEP_DELETED_CELLS disabled. One would have to be aware of that before enabling the new feature. I also need to fix the long lines and put an interface annotation/comment/license into the KeepDeletedCells enum. KEEP_DELETED_CELLS considered harmful? -- Key: HBASE-12363 URL: https://issues.apache.org/jira/browse/HBASE-12363 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Labels: Phoenix Attachments: 12363-master.txt, 12363-test.txt Brainstorming... This morning in the train (of all places) I realized a fundamental issue in how KEEP_DELETED_CELLS is implemented. The problem is around knowing when it is safe to remove a delete marker (we cannot remove it unless all cells affected by it are remove otherwise). This was particularly hard for family marker, since they sort before all cells of a row, and hence scanning forward through an HFile you cannot know whether the family markers are still needed until at least the entire row is scanned. My solution was to keep the TS of the oldest put in any given HFile, and only remove delete markers older than that TS. That sounds good on the face of it... But now imagine you wrote a version of ROW 1 and then never update it again. Then later you write a billion other rows and delete them all. Since the TS of the cells in ROW 1 is older than all the delete markers for the other billion rows, these will never be collected... At least for the region that hosts ROW 1 after a major compaction. Note, in a sense that is what HBase is supposed to do when keeping deleted cells: Keep them until they would be removed by some other means (for example TTL, or MAX_VERSION when new versions are inserted). The specific problem here is that even as all KVs affected by a delete marker are expired this way the marker would not be removed if there just one older KV in the HStore. I don't see a good way out of this. In parent I outlined these four solutions: So there are three options I think: # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not apply to deleted rows or delete marker rows (wouldn't know how long to keep family deletes in that case). (MAX)VERSIONS would still be enforced on all rows types except for family delete markers. # Translate family delete markers to column delete marker at (major) compaction time. # Change HFileWriterV* to keep track of the earliest put TS in a store and write it to the file metadata. Use that use expire delete marker that are older and hence can't affect any puts in the file. # Have Store.java keep track of the earliest put in internalFlushCache and compactStore and then append it to the file metadata. That way HFileWriterV* would not need to know about KVs. And I implemented #4. I'd love to get input on ideas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors
[ https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-12219: --- Reverted branch-1 patch and addendum. Builds are unstable starting w/ this patch going in. I'm reverting till build is back to stable again then will put stuff back. Cache more efficiently getAll() and get() in FSTableDescriptors --- Key: HBASE-12219 URL: https://issues.apache.org/jira/browse/HBASE-12219 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.24, 0.99.1, 0.98.6.1 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Labels: scalability Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png Currently table descriptors and tables are cached once they are accessed for the first time. Next calls to the master only require a trip to HDFS to lookup the modified time in order to reload the table descriptors if modified. However in clusters with a large number of tables or concurrent clients and this can be too aggressive to HDFS and the master causing contention to process other requests. A simple solution is to have a TTL based cached for FSTableDescriptors#getAll() and FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to process those calls faster without causing contention without having to perform a trip to HDFS for every call. to listtables() or getTableDescriptor() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES
Jeffrey Zhong created HBASE-12407: - Summary: HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES Key: HBASE-12407 URL: https://issues.apache.org/jira/browse/HBASE-12407 Project: HBase Issue Type: Bug Affects Versions: 0.99.1, 0.98.7, 2.0.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong This causes a HTable instance with custom RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable internal may use a cached connection without this custom conf setting because CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES
[ https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12407: -- Attachment: HBASE-12407.patch HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES --- Key: HBASE-12407 URL: https://issues.apache.org/jira/browse/HBASE-12407 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.7, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12407.patch This causes a HTable instance with custom RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable internal may use a cached connection without this custom conf setting because CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES
[ https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12407: -- Status: Patch Available (was: Open) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES --- Key: HBASE-12407 URL: https://issues.apache.org/jira/browse/HBASE-12407 Project: HBase Issue Type: Bug Affects Versions: 0.99.1, 0.98.7, 2.0.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12407.patch This causes a HTable instance with custom RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable internal may use a cached connection without this custom conf setting because CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES
[ https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193593#comment-14193593 ] Ted Yu commented on HBASE-12407: +1 HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES --- Key: HBASE-12407 URL: https://issues.apache.org/jira/browse/HBASE-12407 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.7, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12407.patch This causes a HTable instance with custom RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable internal may use a cached connection without this custom conf setting because CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES
[ https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193597#comment-14193597 ] Enis Soztutar commented on HBASE-12407: --- This looks good. Remember that cached/managed connections are going away. So we should switch to using new style of connections in Phoenix in the future. HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES --- Key: HBASE-12407 URL: https://issues.apache.org/jira/browse/HBASE-12407 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.7, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12407.patch This causes a HTable instance with custom RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable internal may use a cached connection without this custom conf setting because CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors
[ https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193614#comment-14193614 ] Hudson commented on HBASE-12219: SUCCESS: Integrated in HBase-1.0 #406 (See [https://builds.apache.org/job/HBase-1.0/406/]) HBASE-12219 Cache more efficiently getAll() and get() in FSTableDescriptors; REVERTgit log! branch-1 patch AND addendum (stack: rev 0aca51e89cd0fe69d9cd57648949df5c5b506c53) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestFSTableDescriptors.java * hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java Cache more efficiently getAll() and get() in FSTableDescriptors --- Key: HBASE-12219 URL: https://issues.apache.org/jira/browse/HBASE-12219 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.24, 0.99.1, 0.98.6.1 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Labels: scalability Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png Currently table descriptors and tables are cached once they are accessed for the first time. Next calls to the master only require a trip to HDFS to lookup the modified time in order to reload the table descriptors if modified. However in clusters with a large number of tables or concurrent clients and this can be too aggressive to HDFS and the master causing contention to process other requests. A simple solution is to have a TTL based cached for FSTableDescriptors#getAll() and FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to process those calls faster without causing contention without having to perform a trip to HDFS for every call. to listtables() or getTableDescriptor() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12407) HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES
[ https://issues.apache.org/jira/browse/HBASE-12407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193638#comment-14193638 ] Hadoop QA commented on HBASE-12407: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678737/HBASE-12407.patch against trunk revision . ATTACHMENT ID: 12678737 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFastFail Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11557//console This message is automatically generated. HConnectionKey doesn't contain CUSTOM_CONTROLLER_CONF_KEY in CONNECTION_PROPERTIES --- Key: HBASE-12407 URL: https://issues.apache.org/jira/browse/HBASE-12407 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.7, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12407.patch This causes a HTable instance with custom RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY conf setting while HTable internal may use a cached connection without this custom conf setting because CUSTOM_CONTROLLER_CONF_KEY isn't part of HConnectionKey.CONNECTION_PROPERTIES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12398) Region isn't assigned in an extreme race condition
[ https://issues.apache.org/jira/browse/HBASE-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193657#comment-14193657 ] Jimmy Xiang commented on HBASE-12398: - The master branch should not have such a problem because only master updates the region states (step b won't happen). So I think we don't need a patch for master. Region isn't assigned in an extreme race condition -- Key: HBASE-12398 URL: https://issues.apache.org/jira/browse/HBASE-12398 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.98.7 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12398.patch In a test, [~enis] has seen a condition which made one of the regions unassigned. The client failed since the region is not online anywhere: {code} 2014-10-29 01:51:40,731 WARN [HBaseReaderThread_13] util.MultiThreadedReader: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions: Wed Oct 29 01:39:51 UTC 2014, org.apache.hadoop.hbase.client.RpcRetryingCaller@cc21330, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9. is not online on hor8n08.gq1.ygridcore.net,60020,1414546670414 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2906) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29990) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:722) {code} The root cause of the issue is due to some extreme race condition: a) a region is about to open and receives a closeRpc request triggered by a second re-assignment b) the second re-assignment updates region state to offline while immediately is overwritten to OPEN from previous region open ZK opened notification c) when the region reopened on the same RS by the second assignment, AM force the region to close as the its region state isn't in PendingOpenOrOpening state. d) the region ends up offline can't server any request Region Server Side: 1) A region almost opens region 689b77e1bad7e951b0d9ef4663b217e9 while the RS(hor8n10) receives a closeRegion request. {noformat} 2014-10-29 01:39:43,153 INFO [PriorityRpcServer.handler=2,queue=0,port=60020] regionserver.HRegionServer: Received CLOSE for the region:689b77e1bad7e951b0d9ef4663b217e9 , which we are already trying to OPEN. Cancelling OPENING. {noformat} 2) Since region 689b77e1bad7e951b0d9ef4663b217e9 was already opened right before some final steps, so the RS logs the following message and close 689b77e1bad7e951b0d9ef4663b217e9 immediately after the RS update ZK node state to 'OPENED'. {noformat} 2014-10-29 01:39:43,198 ERROR [RS_OPEN_REGION-hor8n10:60020-0] handler.OpenRegionHandler: Race condition: we've finished to open a region, while a close was requested on region=IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9.. It can be a critical error, as a region that should be closed is now opened. Closing it now {noformat} In Master Server Side: {noformat} 2014-10-29 01:39:43,177 DEBUG [AM.ZK.Worker-pool2-t55] master.AssignmentManager: Handling RS_ZK_REGION_OPENED, server=hor8n10.gq1.ygridcore.net,60020,1414546531945, region=689b77e1bad7e951b0d9ef4663b217e9, current_state={689b77e1bad7e951b0d9ef4663b217e9 state=OPENING, ts=1414546783152, server=hor8n10.gq1.ygridcore.net,60020,1414546531945} 2014-10-29 01:39:43,255 DEBUG [AM.-pool1-t16] master.AssignmentManager: Offline IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9., it's not any more on hor8n10.gq1.ygridcore.net,60020,1414546531945 2014-10-29 01:39:43,942 DEBUG [AM.ZK.Worker-pool2-t58] master.AssignmentManager: Handling RS_ZK_REGION_OPENED, server=hor8n10.gq1.ygridcore.net,60020,1414546531945, region=689b77e1bad7e951b0d9ef4663b217e9, current_state={689b77e1bad7e951b0d9ef4663b217e9 state=OPEN, ts=1414546783387, server=hor8n10.gq1.ygridcore.net,60020,1414546531945}
[jira] [Commented] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors
[ https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193669#comment-14193669 ] stack commented on HBASE-12219: --- Builds on branch-1 are blue again after backing this out. I think this the zombie maker. Leaving open till we figure why. Cache more efficiently getAll() and get() in FSTableDescriptors --- Key: HBASE-12219 URL: https://issues.apache.org/jira/browse/HBASE-12219 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.24, 0.99.1, 0.98.6.1 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Labels: scalability Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png Currently table descriptors and tables are cached once they are accessed for the first time. Next calls to the master only require a trip to HDFS to lookup the modified time in order to reload the table descriptors if modified. However in clusters with a large number of tables or concurrent clients and this can be too aggressive to HDFS and the master causing contention to process other requests. A simple solution is to have a TTL based cached for FSTableDescriptors#getAll() and FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to process those calls faster without causing contention without having to perform a trip to HDFS for every call. to listtables() or getTableDescriptor() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091
[ https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193670#comment-14193670 ] stack commented on HBASE-12285: --- [~dimaspivak] Open a new issue instead? The surefire snapshot and the culling of the logs put us in a better place for sure. We have mostly blues now when we build. We've been failing since #400 because of HBASE-12219. Was this causing the There was a timeout or other error in the fork Good on you Dima Builds are failing, possibly because of SUREFIRE-1091 - Key: HBASE-12285 URL: https://issues.apache.org/jira/browse/HBASE-12285 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dima Spivak Assignee: Dima Spivak Priority: Blocker Fix For: 2.0.0, 0.99.2 Attachments: HBASE-12285_branch-1_v1.patch, HBASE-12285_branch-1_v1.patch Our branch-1 builds on builds.apache.org have been failing in recent days after we switched over to an official version of Surefire a few days back (HBASE-4955). The version we're using, 2.17, is hit by a bug ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results in an IOException, which looks like what we're seeing on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12219) Cache more efficiently getAll() and get() in FSTableDescriptors
[ https://issues.apache.org/jira/browse/HBASE-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193693#comment-14193693 ] Dima Spivak commented on HBASE-12219: - Using [~manukranthk]'s awesome findHangingTests script, it looks like the set of runs that were red all had org.apache.hadoop.hbase.client.TestAdmin hang, which caused the Surefire-forked process to time out after 15 minutes and fail the Maven build. Cache more efficiently getAll() and get() in FSTableDescriptors --- Key: HBASE-12219 URL: https://issues.apache.org/jira/browse/HBASE-12219 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.24, 0.99.1, 0.98.6.1 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Labels: scalability Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12219-0.98.patch, HBASE-12219-0.98.v1.patch, HBASE-12219-0.99.addendum.patch, HBASE-12219-0.99.patch, HBASE-12219-v1.patch, HBASE-12219-v1.patch, HBASE-12219.v0.txt, HBASE-12219.v2.patch, HBASE-12219.v3.patch, list.png Currently table descriptors and tables are cached once they are accessed for the first time. Next calls to the master only require a trip to HDFS to lookup the modified time in order to reload the table descriptors if modified. However in clusters with a large number of tables or concurrent clients and this can be too aggressive to HDFS and the master causing contention to process other requests. A simple solution is to have a TTL based cached for FSTableDescriptors#getAll() and FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to process those calls faster without causing contention without having to perform a trip to HDFS for every call. to listtables() or getTableDescriptor() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091
[ https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dima Spivak resolved HBASE-12285. - Resolution: Fixed You're right, [~stack]. Sorry for being quick to reopen, was just paranoid. But yay for CI actually helping us track down faulty commits! :) Builds are failing, possibly because of SUREFIRE-1091 - Key: HBASE-12285 URL: https://issues.apache.org/jira/browse/HBASE-12285 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dima Spivak Assignee: Dima Spivak Priority: Blocker Fix For: 2.0.0, 0.99.2 Attachments: HBASE-12285_branch-1_v1.patch, HBASE-12285_branch-1_v1.patch Our branch-1 builds on builds.apache.org have been failing in recent days after we switched over to an official version of Surefire a few days back (HBASE-4955). The version we're using, 2.17, is hit by a bug ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results in an IOException, which looks like what we're seeing on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)