[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShiXing updated HBASE-3725: --- Attachment: HBASE-3725-0.92-V6.patch toTed bq. TestHRegion#testIncrementWithFlushAndDelete passed without that assignment. Because the iscan is also read from memstore after I remove the code: {code} ListKeyValue fileResults = new ArrayListKeyValue(); - iscan.checkOnlyStoreFiles(); scanner = null; try { scanner = getScanner(iscan); {code} And there is no result in memstore, so increment will treat it as 0, it has the same effect as delete. I add this case in TestHRegion#testIncrementWithFlushAndDelete in V6. HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Assignee: Jonathan Gray Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++) { table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10); Increment inc = new Increment(rowKey); inc.addColumn(infoCF, newInc, (long)10); table.increment(inc); } Get get = new Get(rowKey); Result r
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418968#comment-13418968 ] Hadoop QA commented on HBASE-6411: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537268/HBASE-6411-0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 16 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//console This message is automatically generated. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6433) improve getRemoteAddress
binlijin created HBASE-6433: --- Summary: improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6429: -- Status: Patch Available (was: Open) Filter with filterRow() returning true is also incompatible with scan with limit Key: HBASE-6429 URL: https://issues.apache.org/jira/browse/HBASE-6429 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.96.0 Reporter: Jason Dai Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch Currently if we scan with bot limit and a Filter with filterRow(ListKeyValue) implemented, an IncompatibleFilterException will be thrown. The same exception should also be thrown if the filer has its filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419046#comment-13419046 ] Zhihong Ted Yu commented on HBASE-3725: --- How about renaming leftResults as remainingResults ? Please prepare patch for trunk. Thanks HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Assignee: Jonathan Gray Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++) { table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10); Increment inc = new Increment(rowKey); inc.addColumn(infoCF, newInc, (long)10); table.increment(inc); } Get get = new Get(rowKey); Result r = table.get(get); System.out.println(initial values: new + Bytes.toLong(r.getValue(infoCF, newInc)) + old + Bytes.toLong(r.getValue(infoCF, oldInc))); } /** * First deletes the data then increments the column 10 times by 1 each time * * Should result in a value of 10 but it doesn't, it results in
[jira] [Updated] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-6433: Attachment: HBASE-6433-trunk.patch improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Priority: Minor Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-6433: Attachment: HBASE-6433-94.patch HBASE-6433-90.patch HBASE-6433-92.patch improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Priority: Minor Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419049#comment-13419049 ] Hadoop QA commented on HBASE-6429: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537300/hbase-6429-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestCheckTestClasses Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2419//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2419//console This message is automatically generated. Filter with filterRow() returning true is also incompatible with scan with limit Key: HBASE-6429 URL: https://issues.apache.org/jira/browse/HBASE-6429 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.96.0 Reporter: Jason Dai Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch Currently if we scan with bot limit and a Filter with filterRow(ListKeyValue) implemented, an IncompatibleFilterException will be thrown. The same exception should also be thrown if the filer has its filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-6433: Description: Without this patch it costs 4000ns, with this patch it costs 1600ns improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Priority: Minor Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6433: -- Status: Patch Available (was: Open) improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Priority: Minor Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6429: -- Summary: Filter with filterRow() returning true is incompatible with scan with limit (was: Filter with filterRow() returning true is also incompatible with scan with limit) Filter with filterRow() returning true is incompatible with scan with limit --- Key: HBASE-6429 URL: https://issues.apache.org/jira/browse/HBASE-6429 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.96.0 Reporter: Jason Dai Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch Currently if we scan with bot limit and a Filter with filterRow(ListKeyValue) implemented, an IncompatibleFilterException will be thrown. The same exception should also be thrown if the filer has its filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419124#comment-13419124 ] Zhihong Ted Yu commented on HBASE-6429: --- TestFilterWithScanLimits.java and FilterWrapper.java need Apache license. {code} +if(null == filter) { {code} Space between if and (. Why does TestFilterWithScanLimits have main() method ? It should be classified as medium test. Filter with filterRow() returning true is incompatible with scan with limit --- Key: HBASE-6429 URL: https://issues.apache.org/jira/browse/HBASE-6429 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.96.0 Reporter: Jason Dai Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch Currently if we scan with bot limit and a Filter with filterRow(ListKeyValue) implemented, an IncompatibleFilterException will be thrown. The same exception should also be thrown if the filer has its filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6429: -- Status: Open (was: Patch Available) Filter with filterRow() returning true is incompatible with scan with limit --- Key: HBASE-6429 URL: https://issues.apache.org/jira/browse/HBASE-6429 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.96.0 Reporter: Jason Dai Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch Currently if we scan with bot limit and a Filter with filterRow(ListKeyValue) implemented, an IncompatibleFilterException will be thrown. The same exception should also be thrown if the filer has its filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419152#comment-13419152 ] Zhihong Ted Yu commented on HBASE-6433: --- The trunk patch contains reordering of imports which kind of distracts from the goal for this JIRA. Otherwise patch looks good. improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu reassigned HBASE-6433: - Assignee: binlijin improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6433: -- Fix Version/s: 0.96.0 Hadoop Flags: Reviewed improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419154#comment-13419154 ] Hadoop QA commented on HBASE-6433: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537328/HBASE-6433-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2420//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2420//console This message is automatically generated. improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6433: -- Attachment: 6433-getRemoteAddress-trunk.txt Simplified patch for trunk. improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419207#comment-13419207 ] Zhihong Ted Yu commented on HBASE-5547: --- Will integrate in 3 hours if there is no objection. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419255#comment-13419255 ] Zhihong Ted Yu commented on HBASE-3725: --- In trunk, getLastIncrement() call has been replaced with: {code} ListKeyValue results = get(get, false); {code} HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Assignee: Jonathan Gray Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++) { table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10); Increment inc = new Increment(rowKey); inc.addColumn(infoCF, newInc, (long)10); table.increment(inc); } Get get = new Get(rowKey); Result r = table.get(get); System.out.println(initial values: new + Bytes.toLong(r.getValue(infoCF, newInc)) + old + Bytes.toLong(r.getValue(infoCF, oldInc))); } /** * First deletes the data then increments the column 10 times by 1 each time * * Should result in a value of 10
[jira] [Assigned] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu reassigned HBASE-3725: - Assignee: ShiXing (was: Jonathan Gray) HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Assignee: ShiXing Fix For: 0.92.2 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++) { table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10); Increment inc = new Increment(rowKey); inc.addColumn(infoCF, newInc, (long)10); table.increment(inc); } Get get = new Get(rowKey); Result r = table.get(get); System.out.println(initial values: new + Bytes.toLong(r.getValue(infoCF, newInc)) + old + Bytes.toLong(r.getValue(infoCF, oldInc))); } /** * First deletes the data then increments the column 10 times by 1 each time * * Should result in a value of 10 but it doesn't, it results in a values of 110 * * @throws IOException */ static void
[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-3725: -- Fix Version/s: 0.92.2 HBase increments from old value after delete and write to disk -- Key: HBASE-3725 URL: https://issues.apache.org/jira/browse/HBASE-3725 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.90.1 Reporter: Nathaniel Cook Assignee: ShiXing Fix For: 0.92.2 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch Deleted row values are sometimes used for starting points on new increments. To reproduce: Create a row r. Set column x to some default value. Force hbase to write that value to the file system (such as restarting the cluster). Delete the row. Call table.incrementColumnValue with some_value Get the row. The returned value in the column was incremented from the old value before the row was deleted instead of being initialized to some_value. Code to reproduce: {code} import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.HTablePool; import org.apache.hadoop.hbase.client.Increment; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; public class HBaseTestIncrement { static String tableName = testIncrement; static byte[] infoCF = Bytes.toBytes(info); static byte[] rowKey = Bytes.toBytes(test-rowKey); static byte[] newInc = Bytes.toBytes(new); static byte[] oldInc = Bytes.toBytes(old); /** * This code reproduces a bug with increment column values in hbase * Usage: First run part one by passing '1' as the first arg *Then restart the hbase cluster so it writes everything to disk *Run part two by passing '2' as the first arg * * This will result in the old deleted data being found and used for the increment calls * * @param args * @throws IOException */ public static void main(String[] args) throws IOException { if(1.equals(args[0])) partOne(); if(2.equals(args[0])) partTwo(); if (both.equals(args[0])) { partOne(); partTwo(); } } /** * Creates a table and increments a column value 10 times by 10 each time. * Results in a value of 100 for the column * * @throws IOException */ static void partOne()throws IOException { Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); HTableDescriptor tableDesc = new HTableDescriptor(tableName); tableDesc.addFamily(new HColumnDescriptor(infoCF)); if(admin.tableExists(tableName)) { admin.disableTable(tableName); admin.deleteTable(tableName); } admin.createTable(tableDesc); HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); //Increment unitialized column for (int j = 0; j 10; j++) { table.incrementColumnValue(rowKey, infoCF, oldInc, (long)10); Increment inc = new Increment(rowKey); inc.addColumn(infoCF, newInc, (long)10); table.increment(inc); } Get get = new Get(rowKey); Result r = table.get(get); System.out.println(initial values: new + Bytes.toLong(r.getValue(infoCF, newInc)) + old + Bytes.toLong(r.getValue(infoCF, oldInc))); } /** * First deletes the data then increments the column 10 times by 1 each time * * Should result in a value of 10 but it doesn't, it results in a values of 110 * * @throws IOException */ static void partTwo()throws
[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419263#comment-13419263 ] Andrew Purtell commented on HBASE-6432: --- Seems reasonable and low risk to pull the ID from ZooKeeper. HRegionServer doesn't properly set clusterId in conf Key: HBASE-6432 URL: https://issues.apache.org/jira/browse/HBASE-6432 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Francis Liu Assignee: Francis Liu Fix For: 0.96.0 Attachments: HBASE-6432_94.patch ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to default since getMaster() since it uses HBaseRPC to create the proxy directly and bypasses the class which retrieves and sets the correct clusterId. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-6432: -- Affects Version/s: 0.96.0 Fix Version/s: (was: 0.96.0) HRegionServer doesn't properly set clusterId in conf Key: HBASE-6432 URL: https://issues.apache.org/jira/browse/HBASE-6432 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Francis Liu Assignee: Francis Liu Attachments: HBASE-6432_94.patch ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to default since getMaster() since it uses HBaseRPC to create the proxy directly and bypasses the class which retrieves and sets the correct clusterId. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419271#comment-13419271 ] Andrew Purtell edited comment on HBASE-6432 at 7/20/12 3:44 PM: However, the master is responsible for publishing the cluster ID to ZooKeeper. If on a fresh install the regionservers are started first, then they won't find the ID up in ZK until the master comes up. I think this should be a Chore that retries until the ID is found then exits. was (Author: apurtell): However, the master is responsible for publishing the cluster ID to ZooKeeper. If on a fresh install the regionservers are started first, then they won't find the ID up in ZK. I think this should be a Chore that retries until the ID is found then exits. HRegionServer doesn't properly set clusterId in conf Key: HBASE-6432 URL: https://issues.apache.org/jira/browse/HBASE-6432 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Francis Liu Assignee: Francis Liu Attachments: HBASE-6432_94.patch ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to default since getMaster() since it uses HBaseRPC to create the proxy directly and bypasses the class which retrieves and sets the correct clusterId. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419271#comment-13419271 ] Andrew Purtell commented on HBASE-6432: --- However, the master is responsible for publishing the cluster ID to ZooKeeper. If on a fresh install the regionservers are started first, then they won't find the ID up in ZK. I think this should be a Chore that retries until the ID is found then exits. HRegionServer doesn't properly set clusterId in conf Key: HBASE-6432 URL: https://issues.apache.org/jira/browse/HBASE-6432 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Francis Liu Assignee: Francis Liu Attachments: HBASE-6432_94.patch ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to default since getMaster() since it uses HBaseRPC to create the proxy directly and bypasses the class which retrieves and sets the correct clusterId. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6428) Pluggable Compaction policies
[ https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419279#comment-13419279 ] Andrew Purtell commented on HBASE-6428: --- bq. For example one could envision storing old versions of a KV separate HFiles, which then rarely have to be touched/cached by queries querying for new data. In addition these date ranged HFile can be easily used for backups while maintaining historical data. I'd be curious if you think the Coprocessor API for compactions cannot be reworked to handle this. Pluggable Compaction policies - Key: HBASE-6428 URL: https://issues.apache.org/jira/browse/HBASE-6428 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl For some usecases is useful to allow more control over how KVs get compacted. For example one could envision storing old versions of a KV separate HFiles, which then rarely have to be touched/cached by queries querying for new data. In addition these date ranged HFile can be easily used for backups while maintaining historical data. This would be a major change, allowing compactions to provide multiple targets (not just a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419285#comment-13419285 ] Hadoop QA commented on HBASE-6433: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537359/6433-getRemoteAddress-trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2421//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2421//console This message is automatically generated. improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6434) Document effect of slow compressors on the flush path and workaround in the online book
Andrew Purtell created HBASE-6434: - Summary: Document effect of slow compressors on the flush path and workaround in the online book Key: HBASE-6434 URL: https://issues.apache.org/jira/browse/HBASE-6434 Project: HBase Issue Type: Task Reporter: Andrew Purtell Priority: Minor In HBASE-6423 Karthik writes bq. 1. flushing a memstore takes a while (GZIP compression) [... and the memstore gate comes crashing down] We once sidestepped this issue by specifying different compression options for flushes (LZO or none) and major compaction (BZIP2), disabling automatic major compaction, and managing major compaction from a shell based process that iterates over each region on disk and makes some application specific decisions. I go back and forth on whether this is a hack or legitimate HBase ops given how things currently work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419316#comment-13419316 ] Zhihong Ted Yu commented on HBASE-6433: --- I ran the two tests listed above and they passed: {code} Running org.apache.hadoop.hbase.master.TestSplitLogManager Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.933 sec Running org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 68.194 sec {code} Will integrate to trunk later today if there is no objection. improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
nkeywal created HBASE-6435: -- Summary: Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older
[ https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419356#comment-13419356 ] nkeywal commented on HBASE-6401: @stack bq. Does svn blame/git bisecting not turn up the issue that fixed this? Will try. HBase may lose edits after a crash if used with HDFS 1.0.3 or older --- Key: HBASE-6401 URL: https://issues.apache.org/jira/browse/HBASE-6401 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Priority: Critical Attachments: TestReadAppendWithDeadDN.java This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this. Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action. - So we have in hdfs one (dead) process writing with another process reading. - But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0. - So we may miss all the writes to the last block if we try to connect to the dead DN. hdfs 1.0.3, branch-1 or branch-1-win: we have the issue http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup hdfs branch-2 or trunk: we should not have the issue (but not tested) http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup The attached test will fail ~50 of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older
[ https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419358#comment-13419358 ] nkeywal commented on HBASE-6401: HBASE-6435 will lower the probability to get the issue but will not solve it totally. HBase may lose edits after a crash if used with HDFS 1.0.3 or older --- Key: HBASE-6401 URL: https://issues.apache.org/jira/browse/HBASE-6401 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Priority: Critical Attachments: TestReadAppendWithDeadDN.java This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this. Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action. - So we have in hdfs one (dead) process writing with another process reading. - But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0. - So we may miss all the writes to the last block if we try to connect to the dead DN. hdfs 1.0.3, branch-1 or branch-1-win: we have the issue http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup hdfs branch-2 or trunk: we should not have the issue (but not tested) http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup The attached test will fail ~50 of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) improve getRemoteAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419360#comment-13419360 ] Jean-Daniel Cryans commented on HBASE-6433: --- Can we have a meaningful jira title with a meaningful description of the problem plus how it's being addressed? improve getRemoteAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated HBASE-6411: Attachment: HBASE-6411-1.patch Adjusted Elliott's patch. Added example unit-test for MasterMetrics that verifies metrics value change. Had to create MetricsAsserts shim in test sources in compat modules. Please let me know what you think. Will try to extract maps in BaseMetricsSourceImpl(s) into separate class and add support for MetricTags. I guess we agreed on that previously. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6433: -- Description: Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns was:Without this patch it costs 4000ns, with this patch it costs 1600ns Summary: Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress (was: improve getRemoteAddress) @J-D: See if updated description suffices. Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older
[ https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419398#comment-13419398 ] nkeywal commented on HBASE-6401: @stack The oldest version of DFSInputStream.java (its split from DFSClient) is one year old and seems ok on trunk. HBase may lose edits after a crash if used with HDFS 1.0.3 or older --- Key: HBASE-6401 URL: https://issues.apache.org/jira/browse/HBASE-6401 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Priority: Critical Attachments: TestReadAppendWithDeadDN.java This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this. Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action. - So we have in hdfs one (dead) process writing with another process reading. - But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0. - So we may miss all the writes to the last block if we try to connect to the dead DN. hdfs 1.0.3, branch-1 or branch-1-win: we have the issue http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup hdfs branch-2 or trunk: we should not have the issue (but not tested) http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup The attached test will fail ~50 of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419399#comment-13419399 ] Elliott Clark commented on HBASE-6411: -- bq.Will try to extract maps in BaseMetricsSourceImpl(s) into separate class and add support for MetricTags. I guess we agreed on that previously. Thanks. That sounds great. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419408#comment-13419408 ] Jean-Daniel Cryans commented on HBASE-6433: --- Thanks you! Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6436) Netty should be moved off of snapshots.
Elliott Clark created HBASE-6436: Summary: Netty should be moved off of snapshots. Key: HBASE-6436 URL: https://issues.apache.org/jira/browse/HBASE-6436 Project: HBase Issue Type: Task Reporter: Elliott Clark Assignee: Elliott Clark Netty is currently at 3.5.0.final-SNAPSHOT the final 3.5.0.Final should be used when possible so that repositories aren't queried when not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6435: --- Attachment: 6435.unfinished.patch Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419411#comment-13419411 ] Hadoop QA commented on HBASE-6411: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537375/HBASE-6411-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 39 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2422//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2422//console This message is automatically generated. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419414#comment-13419414 ] nkeywal commented on HBASE-6435: The patch is not finished. Actually, it contains for code for the hdfs hook and the related test, but not the code for defining the location order from the file name. But as it is different from what we initially discussed, I post it here in case someone sees something I missed. It does not mean it should not be fixed in hdfs as well, just that this is likely to be much simpler than patching the 1.0 branch... Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older
[ https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419423#comment-13419423 ] nkeywal commented on HBASE-6401: HDFS-3222 is not exactly this one, but not far, and fixed on 2.0 as well. HBase may lose edits after a crash if used with HDFS 1.0.3 or older --- Key: HBASE-6401 URL: https://issues.apache.org/jira/browse/HBASE-6401 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Priority: Critical Attachments: TestReadAppendWithDeadDN.java This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this. Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action. - So we have in hdfs one (dead) process writing with another process reading. - But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0. - So we may miss all the writes to the last block if we try to connect to the dead DN. hdfs 1.0.3, branch-1 or branch-1-win: we have the issue http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup hdfs branch-2 or trunk: we should not have the issue (but not tested) http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup The attached test will fail ~50 of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419446#comment-13419446 ] Todd Lipcon commented on HBASE-6435: I'm -1 on this kind of hack going into HBase before we add the feature to HDFS. I agree that adding to HDFS proper means we have to wait for a release, but this kind of code is likely to be really fragile. Also, without HBase driving requirements of HDFS, it will never evolve to natively have these kind of features, and HBase will devolve into a mess of reflection hacks to change around the HDFS internals. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419461#comment-13419461 ] Zhihong Ted Yu commented on HBASE-6433: --- Integrated to trunk. Thanks for the patch, binlijin. Thanks for the review, J-D. Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419491#comment-13419491 ] Aditya Kishore commented on HBASE-6389: --- bq. BTW what do Ri, C and Fi represent in the formula above ? 'n' is the number of tables in the cluster, *R*~i~ is the number of regions and *CF*~i~ is the number of column families in table 'i' ^\[1\]^. 1. [MSLAB is ON by default|http://hbase.apache.org/book/upgrade0.92.html#d1952e2965] Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419529#comment-13419529 ] stack commented on HBASE-6389: -- @Lars Its up to you (But since you asked, fine by me ... I like what you are doing though Aditya... thanks for the help). Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419535#comment-13419535 ] stack commented on HBASE-6433: -- @binlijin Why not change what getRemoteIp does internally (your patch copies much of the body of getRemoteIp). Is it that getRemoteIp is used in places where Call has not had host address set yet? Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419551#comment-13419551 ] stack commented on HBASE-6435: -- Yeah, we should do both (I'd think that whats added to HDFS is more general than just this workaround scheme where local gets moved to the end of the list; i.e. we add being able to intercept the order returned by the NN and let a client-side policy alter it based on local knowledge if wanted Could add other customizations like being able to set timeout per DFSInput/OutputStream as you've suggested up on dev list N). Would be sweet if the 'hack' were available meantime while we wait on an hdfs release. Looking at patch, looks like inventive hackery; good on you. Do we have to do this in both master and regionserver? Can't do it in HFileSystem constructor assuming it takes a Conf (or that'd be too late?) + HFileSystem.addLocationOrderHack(conf); Rather than have it called a reorderProxy, call it an HBaseDFSClient? Might want to add more customizations while waiting on HDFS fix to arrive. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419552#comment-13419552 ] Hudson commented on HBASE-6433: --- Integrated in HBase-TRUNK #3155 (See [https://builds.apache.org/job/HBase-TRUNK/3155/]) HBASE-6433 Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress (binlijin) (Revision 1363905) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419561#comment-13419561 ] Zhihong Ted Yu commented on HBASE-6433: --- Looking at the code, Connection has this member: {code} private InetAddress addr; {code} But I don't see where it is assigned. The following assignment is to a local variable: {code} InetAddress addr = socket.getInetAddress(); {code} Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older
[ https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419579#comment-13419579 ] stack commented on HBASE-6401: -- @Nkeywal We need another patch on top of hdfs-3222? HBase may lose edits after a crash if used with HDFS 1.0.3 or older --- Key: HBASE-6401 URL: https://issues.apache.org/jira/browse/HBASE-6401 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Priority: Critical Attachments: TestReadAppendWithDeadDN.java This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this. Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action. - So we have in hdfs one (dead) process writing with another process reading. - But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0. - So we may miss all the writes to the last block if we try to connect to the dead DN. hdfs 1.0.3, branch-1 or branch-1-win: we have the issue http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup hdfs branch-2 or trunk: we should not have the issue (but not tested) http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup The attached test will fail ~50 of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419581#comment-13419581 ] nkeywal commented on HBASE-6435: My thinking was it could make it on a hdfs release that accepts changing public interfaces. I fully agree with you Todd, we need to do our homeworks and push hdfs to ensure that what we need is understood and makes it to a release. On the other hand, if I look at how it worked for much simpler stuff like JUnit and surefire, our changes are in theie trunk for a few months and we're still waiting. These things take time. But I will do my homeworks on hdfs, I promise (I may need your help actually). The Jira will be created next week and if I have enough feedback I will propose a patch. I was also wondering if proposing natively to have interceptors would not be interesting for hdfs. It was available a long time in an orb called orbix and was great to use. But they would need to be per conf, so cannot be available with static stuff. bq. Do we have to do this in both master and regionserver? Can't do it in HFileSystem constructor assuming it takes a Conf (or that'd be too late?) It can be put pretty late, basically before we start a recovery process. But we don't want it client side, so I will check this. bq. Rather than have it called a reorderProxy, call it an HBaseDFSClient? Might want to add more customizations while waiting on HDFS fix to arrive. I've intercepted a lower level call: I'm between the DFSClient and the namenode. This because the DFSClient does more than just transferring calls: it contains some logic. Hence going in front of the namenode. But yes, I could make it more generic. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older
[ https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419585#comment-13419585 ] nkeywal commented on HBASE-6401: On 1.x, yes, I think that backporting hdfs-3222 won't be enough. On 2.0, it seems it's ok, even if I can't find the good soul who fixed it. As I can't find a jira, I can create a new one propose a fix specific to branch 1. HBase may lose edits after a crash if used with HDFS 1.0.3 or older --- Key: HBASE-6401 URL: https://issues.apache.org/jira/browse/HBASE-6401 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Priority: Critical Attachments: TestReadAppendWithDeadDN.java This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the hdfs jira for this. Context: HBase Write Ahead Log features. This is using hdfs append. If the node crashes, the file that was written is read by other processes to replay the action. - So we have in hdfs one (dead) process writing with another process reading. - But, despite the call to syncFs, we don't always see the data when we have a dead node. It seems to be because the call in DFSClient#updateBlockInfo ignores the ipc errors and set the length to 0. - So we may miss all the writes to the last block if we try to connect to the dead DN. hdfs 1.0.3, branch-1 or branch-1-win: we have the issue http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup hdfs branch-2 or trunk: we should not have the issue (but not tested) http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup The attached test will fail ~50 of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419588#comment-13419588 ] Todd Lipcon commented on HBASE-6435: I think there's a good motivation to add these kind of APIs generally to DFSInputStream. In particular, I think something like the following: public ListReplica getAvailableReplica(long pos); // return the list of available replicas at given file offset, in priority order public void prioritizeReplica(Replica r); // move given replica to front of list public void blacklistReplica(Replica r); // move replica to back of list (or something of this sort) The Replica API would then expose the datanode IDs (and after HDFS-3672, the disk ID). So, in HBase we could simply open the file, enumerate the replicas, deprioritize the one on the suspected node, and move on with the normal code paths. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419608#comment-13419608 ] nkeywal commented on HBASE-6435: I understand that you don't want to expose the internal nor something like the DatanodeInfo. The same type of API would be useful for the outputstream, putting priorities on nodes (and so reusing some knowledge for the dead nodes, or, for the wal, remove the local writes). It simple and efficient. With the current DFSClient implementation, a callback would ease cases like opening a file already opened for writing, or when a node list is cleared when they all failed. But may be it can be changed as well. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419623#comment-13419623 ] Todd Lipcon commented on HBASE-6435: bq. With the current DFSClient implementation, a callback would ease cases like opening a file already opened for writing, or when a node list is cleared when they all failed. But may be it can be changed as well. Can you explain further what you mean here? What would you use these callbacks for? Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id
[ https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-6321: -- Attachment: HBASE-6321-0.94.patch Had a stab at this. What I figured is that the getting of UUIDs was done outside of {{ReplicationZookeeper}} so it was missing the functionalities from that class (you can also see the feature envy that was going on there). I refactored the ugly UUID stuff in {{ReplicationSource.run}} into {{ReplicationZookeeper.getPeerUUID}}. There I needed to handle the session expiration issues so I refactored that from another method into {{reconnectPeer}}. Now that the issue is handled the possibility of a null UUID remained if the peer wasn't reachable so I added a loop in {{ReplicationSource}}. Finally I saw that we were doing the UUID dance in {{ReplicationSource.init}} for the current cluster so I pushed that to {{ReplicationZookeeper.getUUIDForCluster}} and refactored {{getPeerUUID}} to use it. The code should be clearer a more reliable. ReplicationSource dies reading the peer's id Key: HBASE-6321 URL: https://issues.apache.org/jira/browse/HBASE-6321 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: Jean-Daniel Cryans Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: HBASE-6321-0.94.patch This is what I saw: {noformat} 2012-07-01 05:04:01,638 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 8 because an error occurred: Could not read peer's cluster id org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /va1-backup/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259) at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253) {noformat} The session should just be reopened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id
[ https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-6321: -- Fix Version/s: (was: 0.90.8) Assignee: Jean-Daniel Cryans Assigning this to me and removing the 0.90 target since I found out that that part of the code was added in 0.92 ReplicationSource dies reading the peer's id Key: HBASE-6321 URL: https://issues.apache.org/jira/browse/HBASE-6321 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: HBASE-6321-0.94.patch This is what I saw: {noformat} 2012-07-01 05:04:01,638 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 8 because an error occurred: Could not read peer's cluster id org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /va1-backup/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259) at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253) {noformat} The session should just be reopened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6321) ReplicationSource dies reading the peer's id
[ https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419633#comment-13419633 ] Jean-Daniel Cryans commented on HBASE-6321: --- Oh I forgot to mention that I ran TestReplication/Source/Manager 2 times each and they all passed. ReplicationSource dies reading the peer's id Key: HBASE-6321 URL: https://issues.apache.org/jira/browse/HBASE-6321 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.92.2, 0.96.0, 0.94.2 Attachments: HBASE-6321-0.94.patch This is what I saw: {noformat} 2012-07-01 05:04:01,638 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 8 because an error occurred: Could not read peer's cluster id org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /va1-backup/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259) at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253) {noformat} The session should just be reopened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419646#comment-13419646 ] nkeywal commented on HBASE-6435: If I can to keep the existing interface Today, when you open a file, there is a call to a datanode if the file is also opened for writing somewhere. In HBase, we want the priorities to be taken into account during this opening, as we have a guess that one of these datanode may be dead. So either I register a callback that the DFSClient will call before using its list, either I change the 'open' interface to add the possibility to provide the list of replicas. Same thing for chooseDataNode called from blockSeekTo: even if we have a list at the beginning, this list is recreated during a read as a part of the retry process (in case the NN discovered new replicas on new datanodes). if we put a callback like We would offer this service. {noformat} class ReplicaSet { public ListReplica getAvailableReplica(long pos); // return the list of available replicas at given file offset, in priority order public void prioritizeReplica(Replica r); // move given replica to front of list public void blacklistReplica(Replica r); // move replica to back of list } {noformat} The client would need to implement this interface: {noformat} // Implement this interface and provide it to the DFSClient during its construction to manage the replica ordering interface OrganizeReplicaSet{ void organize(String fileName, ReplicaSet rs); } {noformat} And the DFSClient code would become: {noformat} LocatedBlocks callGetBlockLocations(ClientProtocol namenode, String src, long start, long length) throws IOException { try { LocatedBlocks lbs = namenode.getBlockLocations(src, start, length); if (organizeReplicaSet != null){ ReplicaSet rs = LocatedBlocks.getAsReplicaSet() try { organizeReplicaSet.organize(src, rs); }catch (Throwable t){ throw new IOException(ClientBlockReordorer failed. class=+reorderer.getClass(), t); } return new LocatedBlocks(rs); } else return lbs; {noformat} This is called from the DFSInputStream constructor in openInfo today. In real life I would try to use the class ReplicaSet as an interface on the internal LocatedBlock(s) to limit the number of objects created. The callback could also be given as a parameter to the DFSInputStream constructor if a there is a specific rule to apply... Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution
[jira] [Commented] (HBASE-5985) TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0
[ https://issues.apache.org/jira/browse/HBASE-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419648#comment-13419648 ] Hudson commented on HBASE-5985: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-5985 TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 (Revision 1363561) Result = FAILURE jxiang : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 - Key: HBASE-5985 URL: https://issues.apache.org/jira/browse/HBASE-5985 Project: HBase Issue Type: Test Components: test Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0, 0.94.1 Attachments: hbase-5985.patch --- Test set: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.448 sec FAILURE! org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD Time elapsed: 0 sec ERROR! java.io.IOException: Failed put; errcode=1 at org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.doFsCommand(TestMetaMigrationRemovingHTD.java:124) at org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.setUpBeforeClass(TestMetaMigrationRemovingHTD.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6397) [hbck] print out bulk load commands for sidelined regions if necessary
[ https://issues.apache.org/jira/browse/HBASE-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419654#comment-13419654 ] Hudson commented on HBASE-6397: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6397 [hbck] print out bulk load commands for sidelined regions if necessary (Revision 1362247) Result = FAILURE jxiang : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java [hbck] print out bulk load commands for sidelined regions if necessary -- Key: HBASE-6397 URL: https://issues.apache.org/jira/browse/HBASE-6397 Project: HBase Issue Type: Bug Components: hbck Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: 6397-trunk.patch It's better to print out in the log the command line to bulk load back sidelined regions, if any. Separate it out from HBASE-6392 since it is a different issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6426) Add Hadoop 2.0.x profile to 0.92+
[ https://issues.apache.org/jira/browse/HBASE-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419647#comment-13419647 ] Hudson commented on HBASE-6426: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6426 Add Hadoop 2.0.x profile to 0.92+ (Revision 1363211) Result = FAILURE larsh : Files : * /hbase/branches/0.94/pom.xml Add Hadoop 2.0.x profile to 0.92+ - Key: HBASE-6426 URL: https://issues.apache.org/jira/browse/HBASE-6426 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.1 Attachments: 6426.txt 0.96 already has a Hadoop-2.0 build profile. Let's add this to 0.92 and 0.94 as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419649#comment-13419649 ] Hudson commented on HBASE-6389: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6389 revert (Revision 1363193) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments Key: HBASE-6389 URL: https://issues.apache.org/jira/browse/HBASE-6389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Critical Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, testReplication.jstack Continuing from HBASE-6375. It seems I was mistaken in my assumption that changing the value of hbase.master.wait.on.regionservers.mintostart to a sufficient number (from default of 1) can help prevent assignment of all regions to one (or a small number of) region server(s). While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards to address HBASE-4993. From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even if hbase.master.wait.on.regionservers.mintostart has not reached. Reading the current conditions of waitForRegionServers() clarifies it {code:title=ServerManager.java (trunk rev:1360470)} 581 /** 582 * Wait for the region servers to report in. 583 * We will wait until one of this condition is met: 584 * - the master is stopped 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of 587 *region servers is reached 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND 589 * there have been no new region server in for 590 * 'hbase.master.wait.on.regionservers.interval' time 591 * 592 * @throws InterruptedException 593 */ 594 public void waitForRegionServers(MonitoredTask status) 595 throws InterruptedException { 612 while ( 613 !this.master.isStopped() 614 slept timeout 615 count maxToStart 616 (lastCountChange+interval now || count minToStart) 617 ){ {code} So with the current conditions, the wait will end as soon as timeout is reached even lesser number of RS have checked-in with the Master and the master will proceed with the region assignment among these RSes alone. As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, and I concur, this could have disastrous effect in large cluster especially now that MSLAB is turned on. To enforce the required quorum as specified by hbase.master.wait.on.regionservers.mintostart irrespective of timeout, these conditions need to be modified as following {code:title=ServerManager.java} .. /** * Wait for the region servers to report in. * We will wait until one of this condition is met: * - the master is stopped * - the 'hbase.master.wait.on.regionservers.maxtostart' number of *region servers is reached * - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND * there have been no new region server in for * 'hbase.master.wait.on.regionservers.interval' time AND * the 'hbase.master.wait.on.regionservers.timeout' is reached * * @throws InterruptedException */ public void waitForRegionServers(MonitoredTask status) .. .. int minToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.mintostart, 1); int maxToStart = this.master.getConfiguration(). getInt(hbase.master.wait.on.regionservers.maxtostart, Integer.MAX_VALUE); if (maxToStart minToStart) { maxToStart = minToStart; } .. .. while ( !this.master.isStopped() count maxToStart (lastCountChange+interval now || timeout slept || count minToStart) ){ .. {code} -- This
[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419651#comment-13419651 ] Hudson commented on HBASE-6406: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6406 Remove TestReplicationPeer (Revision 1363213) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently Key: HBASE-6406 URL: https://issues.apache.org/jira/browse/HBASE-6406 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0, 0.94.1 Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack Looking back through the 0.94 test runs these two tests accounted for 11 of 34 failed tests. They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock
[ https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419650#comment-13419650 ] Hudson commented on HBASE-6319: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6319 ReplicationSource can call terminate on itself and deadlock HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363570) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java ReplicationSource can call terminate on itself and deadlock --- Key: HBASE-6319 URL: https://issues.apache.org/jira/browse/HBASE-6319 Project: HBase Issue Type: Bug Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6319-0.92.patch In a few places in the ReplicationSource code calls terminate on itself which is a problem since in terminate() we wait on that thread to die. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient
[ https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419652#comment-13419652 ] Hudson commented on HBASE-4956: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob Copeland) (Revision 1363533) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java Control direct memory buffer consumption by HBaseClient --- Key: HBASE-4956 URL: https://issues.apache.org/jira/browse/HBASE-4956 Project: HBase Issue Type: New Feature Reporter: Ted Yu Assignee: Bob Copeland Fix For: 0.96.0, 0.94.1 Attachments: 4956.txt, thread_get.rb As Jonathan explained here https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1 , standard hbase client inadvertently consumes large amount of direct memory. We should consider using netty for NIO-related tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6392) UnknownRegionException blocks hbck from sideline big overlap regions
[ https://issues.apache.org/jira/browse/HBASE-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419657#comment-13419657 ] Hudson commented on HBASE-6392: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6392 UnknownRegionException blocks hbck from sideline big overlap regions (Revision 1363202) Result = FAILURE jxiang : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java UnknownRegionException blocks hbck from sideline big overlap regions Key: HBASE-6392 URL: https://issues.apache.org/jira/browse/HBASE-6392 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: 6392-0.90.patch, 6392-trunk.patch, 6392-trunk_v2.patch, 6392_0.92.patch Before sidelining a big overlap region, hbck tries to close it and offline it at first. However, sometimes, it throws NotServingRegion or UnknownRegionException. It could be because the region is not open/assigned at all, or some other issue. We should figure out why and fix it. By the way, it's better to print out in the log the command line to bulk load back sidelined regions, if any. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6420) Gracefully shutdown logsyncer
[ https://issues.apache.org/jira/browse/HBASE-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419653#comment-13419653 ] Hudson commented on HBASE-6420: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6420 Gracefully shutdown logsyncer (Revision 1363416) Result = FAILURE jxiang : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java Gracefully shutdown logsyncer - Key: HBASE-6420 URL: https://issues.apache.org/jira/browse/HBASE-6420 Project: HBase Issue Type: Bug Components: wal Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0, 0.94.1 Attachments: 6420-trunk.patch Currently, in closing a HLog, logSyncerThread is interrupted. logSyncer could be in the middle to sync the writer. We should avoid interrupting the sync. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive
[ https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419658#comment-13419658 ] Hudson commented on HBASE-6325: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6319 ReplicationSource can call terminate on itself and deadlock HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363570) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive - Key: HBASE-6325 URL: https://issues.apache.org/jira/browse/HBASE-6325 Project: HBase Issue Type: Bug Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch Yet another bug found during the leap second madness, it's possible to miss the registration of new region servers so that in ReplicationSourceManager.init we start the failover of a live and replicating region server. I don't think there's data loss but the RS that's being failed over will die on: {noformat} 2012-07-01 06:25:15,604 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server sv4r23s48,10304,1341112194623: Writing replication status org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368) {noformat} It seems to me that just refreshing {{otherRegionServers}} after getting the list of {{currentReplicators}} would be enough to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6382) Upgrade Jersey to 1.8 to match Hadoop 1 and 2
[ https://issues.apache.org/jira/browse/HBASE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419656#comment-13419656 ] Hudson commented on HBASE-6382: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-6382 Upgrade Jersey to 1.8 to match Hadoop 1 and 2 (David S. Wang) (Revision 1362308) Result = FAILURE larsh : Files : * /hbase/branches/0.94/pom.xml Upgrade Jersey to 1.8 to match Hadoop 1 and 2 - Key: HBASE-6382 URL: https://issues.apache.org/jira/browse/HBASE-6382 Project: HBase Issue Type: Improvement Components: rest Affects Versions: 0.90.7, 0.92.2, 0.96.0, 0.94.2 Reporter: David S. Wang Assignee: David S. Wang Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6382-trunk.patch Upgrade Jersey dependency from 1.4 to 1.8 to match Hadoop dependencies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419655#comment-13419655 ] Hudson commented on HBASE-5966: --- Integrated in HBase-0.94-security #44 (See [https://builds.apache.org/job/HBase-0.94-security/44/]) HBASE-5966 MapReduce based tests broken on Hadoop 2.0.0-alpha (Gregory Chanan) (Revision 1363586) Result = FAILURE jxiang : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java MapReduce based tests broken on Hadoop 2.0.0-alpha -- Key: HBASE-5966 URL: https://issues.apache.org/jira/browse/HBASE-5966 Project: HBase Issue Type: Bug Components: mapred, mapreduce, test Affects Versions: 0.94.0, 0.96.0 Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) Reporter: Andrew Purtell Assignee: Jimmy Xiang Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, HBASE-5966.patch, hbase-5966.patch Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test rigging. Below is a representative error, can be easily reproduced with: {noformat} mvn -PlocalTests -Psecurity \ -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ clean test \ -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce {noformat} And the result: {noformat} --- T E S T S --- Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec FAILURE! --- Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec FAILURE! testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) Time elapsed: 21.935 sec ERROR! java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) at org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) at org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) at org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at
[jira] [Resolved] (HBASE-6310) -ROOT- corruption when .META. is using the old encoding scheme
[ https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-6310. --- Resolution: Invalid Fix Version/s: (was: 0.94.2) (was: 0.96.0) I'm resolving this as invalid, I was thrown in the wrong direction by what I thought were old/new .META. rows (they in fact never changed) whereas it was a .META. region from almost 3 years ago that was brought back to life. It could have been something like HBASE-6417 that happened, but since I don't have those logs anymore I can't be 100% sure until I reproduce the issue. -ROOT- corruption when .META. is using the old encoding scheme -- Key: HBASE-6310 URL: https://issues.apache.org/jira/browse/HBASE-6310 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: Jean-Daniel Cryans Priority: Blocker We're still working the on the root cause here, but after the leap second armageddon we had a hard time getting our 0.94 cluster back up. This is what we saw in the logs until the master died by itself: {noformat} 2012-07-01 23:01:52,149 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=-ROOT-, metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28, port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000 because: HRegionInfo was null or empty in -ROOT-, row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0, .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0} {noformat} (it's strage that we retry this) This was really misleading because I could see the regioninfo in a scan: {noformat} hbase(main):002:0 scan '-ROOT-' ROW COLUMN+CELL .META.,,1column=info:regioninfo, timestamp=1331755381142, value={NAME = '.META.,,1', STARTKEY = '', ENDKEY = '', ENCODED = 1028785192,} .META.,,1column=info:server, timestamp=1341183448693, value=sfor3s40:10304 .META.,,1 column=info:serverstartcode, timestamp=1341183448693, value=1341183444689 .META.,,1column=info:v, timestamp=1331755419291, value=\x00\x00 .META.,,1259448304806column=info:server, timestamp=1341124914705, value=sfor3s24:10304 .META.,,1259448304806 column=info:serverstartcode, timestamp=1341124914705, value=1341124455863 {noformat} Except that the devil is in the details, .META.,,1 is not .META.,,1259448304806. Basically something writes to .META. by directly creating the row key without caring if the row is in the old format. I did a deleteall in the shell and it fixed the issue... until some time later it was stuck again because the edits reappeared (still not sure why). This time the PostOpenDeployTasksThread were stuck in the RS trying to update .META. but there was no logging (saw it with a jstack). I deleted the row again to make it work. I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 out, but I wouldn't recommend upgrading to 0.94 if your cluster was created before 0.89 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice
[ https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419691#comment-13419691 ] Jimmy Xiang commented on HBASE-6228: I'd like to fix this in HBASE-6381 by making sure SSH blocks on AM.processServerShutdown until the master has joined the cluster, and fixed missing daughters. Fixup daughters twice cause daughter region assigned twice --- Key: HBASE-6228 URL: https://issues.apache.org/jira/browse/HBASE-6228 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6228.patch, HBASE-6228v2.patch, HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch First, how fixup daughters twice happen? 1.we will fixupDaughters at the last of HMaster#finishInitialization 2.ServerShutdownHandler will fixupDaughters when reassigning region through ServerShutdownHandler#processDeadRegion When fixupDaughters, we will added daughters to .META., but it coudn't prevent the above case, because FindDaughterVisitor. The detail is as the following: Suppose region A is a splitted parent region, and its daughter region B is missing 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B to .META. with serverName=null, and assign the daughter. 2.Then, Master's initialization thread will also find the daughter region B is missing and assign it. It is because FindDaughterVisitor consider daughter is missing if its serverName=null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress
[ https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419692#comment-13419692 ] Hudson commented on HBASE-6433: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #101 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/101/]) HBASE-6433 Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress (binlijin) (Revision 1363905) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress Key: HBASE-6433 URL: https://issues.apache.org/jira/browse/HBASE-6433 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: binlijin Priority: Minor Fix For: 0.96.0 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to call.connection.socket.getInetAddress(). The host address is actually stored in HBaseServer.Connection.hostAddress field. We don't need to go through Socket to get this information. Without this patch it costs 4000ns, with this patch it costs 1600ns -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6428) Pluggable Compaction policies
[ https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419695#comment-13419695 ] Lars Hofhansl commented on HBASE-6428: -- That is an excellent point. Should also think about HBASE-6427 with this in mind. Pluggable Compaction policies - Key: HBASE-6428 URL: https://issues.apache.org/jira/browse/HBASE-6428 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl For some usecases is useful to allow more control over how KVs get compacted. For example one could envision storing old versions of a KV separate HFiles, which then rarely have to be touched/cached by queries querying for new data. In addition these date ranged HFile can be easily used for backups while maintaining historical data. This would be a major change, allowing compactions to provide multiple targets (not just a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419696#comment-13419696 ] Lars Hofhansl commented on HBASE-5547: -- +1 on patch. Ted pinged me, that he is out already. Since this is a Salesforce patch, I should commit it anyway. Will do so as soon as I get to it. Jesse, do you have a feeling about how different a 0.94 patch would be? Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419697#comment-13419697 ] Jesse Yates commented on HBASE-5547: @Lars I don't think it would be all that different. I'll take a crack next week (after dealing with the next round of HBASE-6055 stuff). Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5659) TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally
[ https://issues.apache.org/jira/browse/HBASE-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419722#comment-13419722 ] Lars Hofhansl commented on HBASE-5659: -- Without parent the revised test fails every time. With parent it fails rarely. I do not know what the issue is. This only happens when the test does heavy flushing (during the course of the test 1000 flushes happen. So the problem might be there. I can offer to disable the test or to reduce the number of flushes for now, but of course that pastes over the problem. I also would not mind if somebody else has a look at the test and check whether test logic itself is flawed. TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally -- Key: HBASE-5659 URL: https://issues.apache.org/jira/browse/HBASE-5659 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.0 See run here: https://builds.apache.org/job/PreCommit-HBASE-Build/1318//testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/ {quote} 2012-03-27 04:36:12,627 DEBUG [Thread-118] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/7202/Put/vlen=6/ts=7922,and after = rowB/colfamily11:qual1/7199/DeleteColumn/vlen=0/ts=0 2012-03-27 04:36:12,629 INFO [Thread-121] regionserver.HRegion(1558): Finished memstore flush of ~2.9k/2952, currentsize=1.6k/1640 for region testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81. in 14ms, sequenceid=7927, compaction requested=true 2012-03-27 04:36:12,629 DEBUG [Thread-126] regionserver.TestAtomicOperation$2(362): flushing 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1426): Started memstore flush for testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., current region memstore size 1.9k 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1474): Finished snapshotting testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., commencing wait for mvcc, flushsize=1968 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1484): Finished snapshotting, commencing flushing stores 2012-03-27 04:36:12,630 DEBUG [Thread-126] util.FSUtils(153): Creating file=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57 with permission=rwxrwxrwx 2012-03-27 04:36:12,631 DEBUG [Thread-126] hfile.HFileWriterV2(143): Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false] 2012-03-27 04:36:12,631 INFO [Thread-126] regionserver.StoreFile$Writer(997): Delete Family Bloom filter type for /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57: CompoundBloomFilterWriter 2012-03-27 04:36:12,632 INFO [Thread-126] regionserver.StoreFile$Writer(1220): NO General Bloom and NO DeleteFamily was added to HFile (/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57) 2012-03-27 04:36:12,632 INFO [Thread-126] regionserver.Store(770): Flushed , sequenceid=7934, memsize=1.9k, into tmp file /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57 2012-03-27 04:36:12,632 DEBUG [Thread-126] regionserver.Store(795): Renaming flushed file at /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57 to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57 2012-03-27 04:36:12,634 INFO [Thread-126] regionserver.Store(818): Added
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419723#comment-13419723 ] Lars Hofhansl commented on HBASE-5547: -- I also verified in a real setup, that an HFile is indeed archived and (by default) removed after 5 mins. Was thrown off first, because the table/region/cf directory is not removed when empty. Also made sure I can create/drop tables and then .META. is backed up correctly. So still +1 :) Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5659) TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally
[ https://issues.apache.org/jira/browse/HBASE-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5659: - Fix Version/s: 0.94.2 TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally -- Key: HBASE-5659 URL: https://issues.apache.org/jira/browse/HBASE-5659 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.2 See run here: https://builds.apache.org/job/PreCommit-HBASE-Build/1318//testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/ {quote} 2012-03-27 04:36:12,627 DEBUG [Thread-118] regionserver.StoreScanner(499): Storescanner.peek() is changed where before = rowB/colfamily11:qual1/7202/Put/vlen=6/ts=7922,and after = rowB/colfamily11:qual1/7199/DeleteColumn/vlen=0/ts=0 2012-03-27 04:36:12,629 INFO [Thread-121] regionserver.HRegion(1558): Finished memstore flush of ~2.9k/2952, currentsize=1.6k/1640 for region testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81. in 14ms, sequenceid=7927, compaction requested=true 2012-03-27 04:36:12,629 DEBUG [Thread-126] regionserver.TestAtomicOperation$2(362): flushing 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1426): Started memstore flush for testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., current region memstore size 1.9k 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1474): Finished snapshotting testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., commencing wait for mvcc, flushsize=1968 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1484): Finished snapshotting, commencing flushing stores 2012-03-27 04:36:12,630 DEBUG [Thread-126] util.FSUtils(153): Creating file=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57 with permission=rwxrwxrwx 2012-03-27 04:36:12,631 DEBUG [Thread-126] hfile.HFileWriterV2(143): Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false] 2012-03-27 04:36:12,631 INFO [Thread-126] regionserver.StoreFile$Writer(997): Delete Family Bloom filter type for /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57: CompoundBloomFilterWriter 2012-03-27 04:36:12,632 INFO [Thread-126] regionserver.StoreFile$Writer(1220): NO General Bloom and NO DeleteFamily was added to HFile (/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57) 2012-03-27 04:36:12,632 INFO [Thread-126] regionserver.Store(770): Flushed , sequenceid=7934, memsize=1.9k, into tmp file /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57 2012-03-27 04:36:12,632 DEBUG [Thread-126] regionserver.Store(795): Renaming flushed file at /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57 to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57 2012-03-27 04:36:12,634 INFO [Thread-126] regionserver.Store(818): Added /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57, entries=12, sequenceid=7934, filesize=1.3k 2012-03-27 04:36:12,642 DEBUG [Thread-118] regionserver.TestAtomicOperation$2(392): [] Exception in thread Thread-118 junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48)
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419724#comment-13419724 ] Lars Hofhansl commented on HBASE-5547: -- Ok. On more question I asked just now on RB: From Matteo: {quote} MasterFileSystem contains deleteRegion() and deleteTable() that calls fs.delete() with the recursive flag on. This two methods get called by DeleteTableHandler (drop table). In a backup/snapshot situation we want to keep the regions/hfiles. {quote} My follow up question: {quote} I find that deleteRegion() was addressed, but not deleteTable(). That means if a table is dropped the HFiles would be deleted and not archived. So it seems we should either: - also delete the table's archive directory (since it would be incomplete anyway). - archive all the HFile before deleting them. What do you think Jesse? {quote} Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419725#comment-13419725 ] Lars Hofhansl edited comment on HBASE-5547 at 7/21/12 2:49 AM: --- Or is it that all regions are first deleted anyway, and only then the deleteTable is called (in DeleteTableHandler.handleTableOperation)? Edit: Spelling was (Author: lhofhansl): Or is it that all region are first deleted anyway, and only then the deleteTable is called (in DeleteTableHandler.handleTableOperation) Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419725#comment-13419725 ] Lars Hofhansl commented on HBASE-5547: -- Or is it that all region are first deleted anyway, and only then the deleteTable is called (in DeleteTableHandler.handleTableOperation) Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5954) Allow proper fsync support for HBase
[ https://issues.apache.org/jira/browse/HBASE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419738#comment-13419738 ] Lars Hofhansl commented on HBASE-5954: -- I think the API going multiple ways (these are not mutually exclusive): # hsync for HFiles (would guard compactions, etc, very lightweight), enabled with a config option (default on I think) # hsync all WAL edits (very expensive, but would not require client changes), enabled with a config option (default off) # sync per Put. Gives control to the application. A batch put would hsync the WAL if at least one Put in the batch was market with hsync. What about deletes? In 0.94 they are not batched; could it at the end of operation there. # Per RPC. Could send flag with the RPC from the client. I.e. HTable would have a Put(ListPut puts, boolean hsync) method # HTable.hsync. Client calls this when data must be sync'ed. Most flexible, but incurs an extra RPC to the RegionServer just to force the hsync. Comments welcome. Allow proper fsync support for HBase Key: HBASE-5954 URL: https://issues.apache.org/jira/browse/HBASE-5954 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0, 0.94.2 Attachments: 5954-trunk-hdfs-trunk-v2.txt, 5954-trunk-hdfs-trunk-v3.txt, 5954-trunk-hdfs-trunk-v4.txt, 5954-trunk-hdfs-trunk-v5.txt, 5954-trunk-hdfs-trunk-v6.txt, 5954-trunk-hdfs-trunk.txt, hbase-hdfs-744.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-5954) Allow proper fsync support for HBase
[ https://issues.apache.org/jira/browse/HBASE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419738#comment-13419738 ] Lars Hofhansl edited comment on HBASE-5954 at 7/21/12 4:01 AM: --- I think the API going multiple ways (these are not mutually exclusive): # hsync for HFiles (would guard compactions, etc, very lightweight), enabled with a config option (default on I think) # hsync all WAL edits (very expensive, but would not require client changes), enabled with a config option (default off) # hsync for tables or column families for HFiles (configured in the table/column descriptor) # hsync for tables or column families for the WAL (configured in the table/column descriptor) # WAL hsync per Put. Gives control to the application. A batch put would hsync the WAL if at least one Put in the batch was market with hsync. What about deletes? In 0.94 they are not batched; could it at the end of operation there. # WAL hsync per RPC. Could send flag with the RPC from the client. I.e. HTable would have a Put(ListPut puts, boolean hsync) method # HTable.hsync. Client calls this when WAL must be sync'ed. Most flexible, but incurs an extra RPC to the RegionServer just to force the hsync. Comments welcome. Edit: Forgot some options. was (Author: lhofhansl): I think the API going multiple ways (these are not mutually exclusive): # hsync for HFiles (would guard compactions, etc, very lightweight), enabled with a config option (default on I think) # hsync all WAL edits (very expensive, but would not require client changes), enabled with a config option (default off) # sync per Put. Gives control to the application. A batch put would hsync the WAL if at least one Put in the batch was market with hsync. What about deletes? In 0.94 they are not batched; could it at the end of operation there. # Per RPC. Could send flag with the RPC from the client. I.e. HTable would have a Put(ListPut puts, boolean hsync) method # HTable.hsync. Client calls this when data must be sync'ed. Most flexible, but incurs an extra RPC to the RegionServer just to force the hsync. Comments welcome. Allow proper fsync support for HBase Key: HBASE-5954 URL: https://issues.apache.org/jira/browse/HBASE-5954 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0, 0.94.2 Attachments: 5954-trunk-hdfs-trunk-v2.txt, 5954-trunk-hdfs-trunk-v3.txt, 5954-trunk-hdfs-trunk-v4.txt, 5954-trunk-hdfs-trunk-v5.txt, 5954-trunk-hdfs-trunk-v6.txt, 5954-trunk-hdfs-trunk.txt, hbase-hdfs-744.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6406: - Fix Version/s: (was: 0.94.1) 0.94.2 TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently Key: HBASE-6406 URL: https://issues.apache.org/jira/browse/HBASE-6406 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.96.0, 0.94.2 Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack Looking back through the 0.94 test runs these two tests accounted for 11 of 34 failed tests. They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419746#comment-13419746 ] Zhihong Ted Yu commented on HBASE-5547: --- I think the latest patch has addressed HFile archival when deleteRegion() is called. Backing up / restoring table can be addressed in HBASE-6055. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419748#comment-13419748 ] Lars Hofhansl commented on HBASE-5547: -- True, but if dropping a table just drops the latest HFiles to the floor and leaves a partial backup around this entire exercise is pointless. Anyway, from the code in DeleteTableHandler.handleTableOperation it looks like all regions are deleted first (using deleteRegion) and then the table directory is deleted, so it should be correct. Just making sure here. Don't delete HFiles when in backup mode - Key: HBASE-5547 URL: https://issues.apache.org/jira/browse/HBASE-5547 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Jesse Yates Fix For: 0.94.2 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, java_HBASE-5547_v7.patch This came up in a discussion I had with Stack. It would be nice if HBase could be notified that a backup is in progress (via a znode for example) and in that case either: 1. rename HFiles to be delete to file.bck 2. rename the HFiles into a special directory 3. rename them to a general trash directory (which would not need to be tied to backup mode). That way it should be able to get a consistent backup based on HFiles (HDFS snapshots or hard links would be better options here, but we do not have those). #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira