[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215477#comment-13215477 ] Hadoop QA commented on HBASE-5317: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515901/HBASE-5317to0.92.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1040//console This message is automatically generated. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, HBASE-5317-v6.patch, HBASE-5317to0.92.patch, TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215482#comment-13215482 ] Mubarak Seyed commented on HBASE-4991: -- bq. How does Accumulo do it do you know? You might get some ideas over there. Will take a look. Todd's preso highlights comparison of HBase vs Accumulo - http://www.slideshare.net/cloudera/h-base-and-accumulo-todd-lipcom-jan-25-2012 Source: https://svn.apache.org/repos/asf/incubator/accumulo/branches/1.4/src/server/src/main/java/org/apache/accumulo/server/fate/ (Master-coordinated tasks uses Fate, refer TStore.java) https://svn.apache.org/repos/asf/incubator/accumulo/branches/1.4/src/server/src/main/java/org/apache/accumulo/server/master/Master.java Notes from TStore.java {code} /** * Transaction Store: a place to save transactions * * A transaction consists of a number of operations. To use, first create a transaction id, and then seed the * transaction with an initial operation. An executor service can then execute the transaction's operation, * possibly pushing more operations onto the transaction as each step successfully completes. * If a step fails, the stack can be unwound, undoing each operation. */ {code} For example, delete-range operation in master uses fate to seed transaction with an DELETE_RANGE table operation, submit a task, executor service can then execute the op. {code} public void executeTableOperation(TInfo tinfo, AuthInfo c, long opid, org.apache.accumulo.core.master.thrift.TableOperation op, ListByteBuffer arguments, MapString,String options, boolean autoCleanup){ case DELETE_RANGE: { String tableName = ByteBufferUtil.toString(arguments.get(0)); Text startRow = ByteBufferUtil.toText(arguments.get(1)); Text endRow = ByteBufferUtil.toText(arguments.get(2)); final String tableId = checkTableId(tableName, TableOperation.DELETE_RANGE); checkNotMetadataTable(tableName, TableOperation.DELETE_RANGE); verify(c, tableId, TableOperation.DELETE_RANGE, check(c, SystemPermission.SYSTEM) || check(c, tableId, TablePermission.WRITE)); fate.seedTransaction(opid, new TraceRepoMaster(new TableRangeOp(MergeInfo.Operation.DELETE, tableId, startRow, endRow)), autoCleanup); break; } } {code} Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: Filtered_scans_v3.patch Fixed all failed tests, added test for joined scanners functionality. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Patch Available (was: Open) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215502#comment-13215502 ] Zhihong Yu commented on HBASE-5416: --- {code} + KeyValue nextKV = this.joinedHeap.peek(); + while (true) { +this.joinedHeap.next(results, limit - results.size()); +nextKV = this.joinedHeap.peek(); {code} I think the first peek() isn't needed because there is another peek() inside the loop. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215504#comment-13215504 ] Max Lapan commented on HBASE-5416: -- @Thomas: Yes, this is the primary goal of this patch. When CF_B is large, we'll load only needed blocks from it (via seek), which could give a huge speedup in scan. @Zhihong: Thanks, I'll fix this, now waiting to jenkins results. Didn't know about reviews.apache.org, thanks. I'll post there, of couse :). Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215519#comment-13215519 ] Hadoop QA commented on HBASE-5416: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515904/Filtered_scans_v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -133 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//console This message is automatically generated. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215521#comment-13215521 ] Zhihong Yu commented on HBASE-5416: --- The following line is too long: {code} +if (this.joinedHeap != null this.joinedHeap.seek(KeyValue.createFirstOnRow(currentRow))) { {code} Please limit to 80 chars per line. You can get Eclipse formatter from HBASE-3678. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Open (was: Patch Available) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: Filtered_scans_v4.patch Fixed comment, removed extra peek() call and folded long line. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Patch Available (was: Open) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215529#comment-13215529 ] Max Lapan commented on HBASE-5416: -- @Zhihong: Have trouble with post new review request - it gives 500 error. Maybe this is related with apache jira issues, will try later. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215550#comment-13215550 ] Max Lapan commented on HBASE-5416: -- @stack: Documentation paragraph to include. I think it should go there: http://hbase.apache.org/book.html#number.of.cfs {quote} There is a performance option to keep in mind on schema design. In some situations, two (or more) columns family schema could be much faster than a single-CF design. It could be the case when you have one column which is used to sieve larger rows from other columns. If SingleColumnValueFilter or SingleColumnValueExcludeFilter is used to find the needed rows, only a small column is scanned, other columns are loaded only when matching row has been found. This could reduce the amount of data loaded significantly and lead to faster scans. {quote} Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Open (was: Patch Available) There is still a mistake somewhere, our stats scan return different results. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-5455: - Status: Open (was: Patch Available) Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Priority: Minor Fix For: 0.94.0 HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-5455: - Status: Patch Available (was: Open) Added a test case for class to int mapping in HbaseObjectWritable to ensure wire compatibility. Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Priority: Minor Fix For: 0.94.0 HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-5455: - Attachment: HBASE-5455.diff Updated TestHbaseObjectWritable to error on class code changes that would affect the wire protocol. Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5455.diff HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5416: -- Attachment: 5416-v5.txt Patch v5 is based on v4, with grammatical corrections. @Max: What do you think ? @Override is missing for isFamilyEssential() in a few files. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5416: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5416: -- Status: Open (was: Patch Available) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5416: -- Attachment: 5416-v6.txt Same as patch v5. I verified that patch v6 can be used to generate new review request. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215703#comment-13215703 ] Zhihong Yu commented on HBASE-5317: --- Integrated to 0.92 branch. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, HBASE-5317-v6.patch, HBASE-5317to0.92.patch, TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215720#comment-13215720 ] Nicolas Spiegelberg commented on HBASE-5416: Overall, I agree that this is a useful design pattern. We use this pattern in our messages deployment and other production use cases as well. I'm more concerned about this being in the critical path. This is deep in the core logic, which has a lot of complicated usage and is extremely bug-prone (even after extensive unit tests). If you don't need atomicity, then you don't get much benefit from solving this in the critical path. The change introduces a lot of risk and design decisions that we have to worry about years later. It might be some work to understand how to use a batch factor; but don't you think it would take more work to understand the variety of use cases for scans to ensure that we don't introduce side effects and make a scalable architectural decision? At the very least, we should get a scan expert to look at this code before committing. I'm not one, but I know this isn't the same as making a business logic change. I just have one question about the patch right now: Should we have unit tests case for ensuring the interop between this feature and 'limit'? For example, ensure that joinedHeap is scanned before going to the next row if the storeHeap results.size() == limit Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5437) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy
[ https://issues.apache.org/jira/browse/HBASE-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5437: -- Resolution: Fixed Status: Resolved (was: Patch Available) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy --- Key: HBASE-5437 URL: https://issues.apache.org/jira/browse/HBASE-5437 Project: HBase Issue Type: Bug Components: metrics, thrift Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.94.0 Attachments: HBASE-5437.D1857.1.patch, HBASE-5437.D1887.1.patch, HBASE-5437.D1887.2.patch 3.facebook.com,60020,1329865516120: Initialization of RS failed. Hence aborting RS. java.lang.ClassCastException: $Proxy9 cannot be cast to org.apache.hadoop.hbase.thrift.generated.Hbase$Iface at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.newInstance(HbaseHandlerMetricsProxy.java:47) at org.apache.hadoop.hbase.thrift.ThriftServerRunner.init(ThriftServerRunner.java:239) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 15:05:18,749 FATAL org.apache.hadoop.h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215757#comment-13215757 ] Lars Hofhansl commented on HBASE-5075: -- Actually, the patches do not apply cleanly to HBase trunk. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215759#comment-13215759 ] Adrian Muraru commented on HBASE-5351: -- Saw the same issue in 0.92 branch and trace it down to the same {noformat}this.hbAdmin.createTableAsync(htd, keys);{noformat} and wondering why we wouldn't change this to: {noformat}this.hbAdmin.createTable{noformat} instead of looping and waiting for table to become available hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215769#comment-13215769 ] Himanshu Vashishtha commented on HBASE-4348: I have created a patch, which involves a new method in org.apache.hadoop.hbase.master.AssignmentManager and supporting code in src/main/jamon/org/apache/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon. I am running it on my local system, and wonder about how to test this, i.e., to get some regions in RIT. Any suggestions please? Add metrics for regions in transition - Key: HBASE-4348 URL: https://issues.apache.org/jira/browse/HBASE-4348 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Himanshu Vashishtha Priority: Minor Labels: noob The following metrics would be useful for monitoring the master: - the number of regions in transition - the number of regions in transition that have been in transition for more than a minute - how many seconds has the oldest region-in-transition been in transition -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215771#comment-13215771 ] Max Lapan commented on HBASE-5416: -- @Nicolas: Still, have no idea how to resolve our slow scans problem different way. Two-phase rpc would be very inefficient in map-reduce job, when we need to issue lots of gets for each obtained 'flag' row and and have no good place to save them for multi-get (which could be huge in some cases). Batching also have little help there, because slowness not caused by a large Results, but tons of useless work, performed by a regionserver on such scans. Or, maybe, I missed something? I agree that this solution is not elegant and complicates scan machinery, but all other approaches looks worse. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215784#comment-13215784 ] Hudson commented on HBASE-5317: --- Integrated in HBase-0.92 #302 (See [https://builds.apache.org/job/HBase-0.92/302/]) HBASE-5317 Fix TestHFileOutputFormat to work against hadoop 0.23 (Gregory Taylor) (Revision 1293306) Result = SUCCESS tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/pom.xml * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, HBASE-5317-v6.patch, HBASE-5317to0.92.patch, TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215785#comment-13215785 ] Jesse Yates commented on HBASE-5075: Haven't had a chance to look at the latest patch yet, but have read through the docs. I have the same concern as Lars, namely, bq. a bit worried about maintaining an additional process on every machine What about doing something a bit simpler like adding a runtime shutdown hook to the RS such that the region server will update ZK or the master when it decides to bail out. Even something as simple as just removing your own znode on failure would be sufficient to cover this use case, correct? regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215789#comment-13215789 ] Mikhail Bautin commented on HBASE-5416: --- @Max: if you scan the 'flag' column family first, find the rows that you are interested in, and query only those rows from the 'snap' column family, you will avoid the slowness from scanning every row in 'snap'. With proper batching, the two-pass approach should work fine if you don't need atomicity. The problem with such deep changes to the scanner framework is that it would require comprehensive new unit tests. The included unit test only writes three rows and does not really check the new feature (or the old functionality) on a large scale. Take a look at TestMultiColumnScanner and TestSeekOptimizations. We will need something at least as comprehensive as those tests for this improvement, probably even a multithreaded test case to ensure we don't break atomicity. If we do not do that testing now, we will still have to do it before the next stable release, but it would be unfair to pass the hidden costs of testing to those who don't need this particular optimization right now but will soon need a stable system for another production release. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215796#comment-13215796 ] stack commented on HBASE-5075: -- bq. Even something as simple as just removing your own znode on failure would be sufficient to cover this use case, correct? Lets do that regardless. Good idea. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5455: - Assignee: Michael Drzal Status: Patch Available (was: Open) Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Assignee: Michael Drzal Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5455.diff HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215798#comment-13215798 ] Jean-Daniel Cryans commented on HBASE-4365: --- FWIW running a 5TB upload took 18h. Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.94.0 Reporter: Todd Lipcon Assignee: stack Priority: Critical Labels: usability Fix For: 0.94.0 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215800#comment-13215800 ] stack commented on HBASE-5351: -- @Adrian That seems like the way to go. hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215802#comment-13215802 ] stack commented on HBASE-5455: -- +1 Excellent Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Assignee: Michael Drzal Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5455.diff HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215803#comment-13215803 ] Jean-Daniel Cryans commented on HBASE-4365: --- Oh and no concurrent mode failures, as I don't use dumb configurations. Also my ZK timeout is set to 20s. Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.94.0 Reporter: Todd Lipcon Assignee: stack Priority: Critical Labels: usability Fix For: 0.94.0 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-59) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly
[ https://issues.apache.org/jira/browse/HBASE-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-59. Resolution: Won't Fix We are not going to change this now I'd say (This issue is 4+ years old) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly - Key: HBASE-59 URL: https://issues.apache.org/jira/browse/HBASE-59 Project: HBase Issue Type: Improvement Components: mapred Reporter: Michael Bieniosek Priority: Trivial mapreduce has a configuration property called mapred.system.dir which determines where in the DFS a jobtracker stores its data. Similarly, hbase has a configuration property called hbase.rootdir which does something very similar. These should have the same name, eg. hbase.system.dir and mapred.system.dir -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-587) Add auto-primary-key feature
[ https://issues.apache.org/jira/browse/HBASE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-587. - Resolution: Won't Fix Doing as Harsh suggests. Hard to do this feature in a scalable way. If wanted, we could do something like cassandra's time-based UUID to mint UUIDs that go in a chronological direction if that'd help. Add auto-primary-key feature Key: HBASE-587 URL: https://issues.apache.org/jira/browse/HBASE-587 Project: HBase Issue Type: New Feature Reporter: Bryan Duxbury Priority: Trivial Some folks seem to be interested in having their row keys automatically generated in a unique fashion. Maybe we could do something like allow the user to specify they want an automatic key, and then we'll generate a GUID that's unique for that table and return it as part of the commit. Not sure what the mechanics would look like exactly, but seems doable and it's going to be a more prevalent use case as people start to put data into HBase first without touching another system or pushing data without a natural unique primary key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-765) Adding basic Spring DI support to IndexConfiguration class.
[ https://issues.apache.org/jira/browse/HBASE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-765. - Resolution: Won't Fix We don't have IndexConfiguration anymore in our code base. Won't fix. Adding basic Spring DI support to IndexConfiguration class. --- Key: HBASE-765 URL: https://issues.apache.org/jira/browse/HBASE-765 Project: HBase Issue Type: Improvement Components: mapred Affects Versions: 0.16.0, 0.1.0, 0.1.1, 0.1.2, 0.1.3 Environment: n/a Reporter: Ryan Smith Priority: Minor Original Estimate: 20m Remaining Estimate: 20m Spring can configure classes/object graphs via xml. I am pretty much able to configure the entire MR object graph to launch MR jobs via spring except class IndexConfiguration.java. So instead of only using addFromXML() to configure IndexConfiguration, it would be nice to add support so Spring could set all class variables needed for initialization in IndexConfiguration without invoking addFromXML(). Since the class IndexConfiguration already has setters and getters for almost all its members, it's almost compliant for a spring configuration bean except one issue: no ability to configure columnMap outside of calling addFromXML(). The easiest way i can figure is to allow a setter for the column map and put any logic for checking the map integrity there. By adding a few methods to IndexConfiguration.java , it should solve the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1012) [performance] Try doctoring a dfsclient so it shortcircuits hdfs when blocks are local
[ https://issues.apache.org/jira/browse/HBASE-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-1012. -- Resolution: Duplicate This is done, available in hdfs. [performance] Try doctoring a dfsclient so it shortcircuits hdfs when blocks are local -- Key: HBASE-1012 URL: https://issues.apache.org/jira/browse/HBASE-1012 Project: HBase Issue Type: Task Components: performance Reporter: stack Ning Li up on list has stated that getting blocks using hdfs though the block is local takes almost the same amount of time as accesing the block over the network. See if can do something smarter when the data is known to be local short-circuiting hdfs if we can in a subclass of DFSClient (George Porter suggestion). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1339) NPE in HCM.procesRow called from master.jsp
[ https://issues.apache.org/jira/browse/HBASE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-1339. -- Resolution: Won't Fix No longer pertinent. We don't see this any more. NPE in HCM.procesRow called from master.jsp --- Key: HBASE-1339 URL: https://issues.apache.org/jira/browse/HBASE-1339 Project: HBase Issue Type: Bug Reporter: stack {code} 2009-04-22 02:10:34,710 WARN /: /master.jsp: java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$1.processRow(HConnectionManager.java:344) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:64) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:29) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:351) at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:121) at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:121) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1748) ClusterStatus needs to print out who has master role
[ https://issues.apache.org/jira/browse/HBASE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-1748. -- Resolution: Duplicate Fixed by 'HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup' ClusterStatus needs to print out who has master role Key: HBASE-1748 URL: https://issues.apache.org/jira/browse/HBASE-1748 Project: HBase Issue Type: Bug Components: master Reporter: stack Priority: Trivial Attachments: HBASE-1748.patch Is in zk_dump but not in clusterstatus. You need it when you have 5 masters and you are trying to find the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1559) IllegalThreadStateException during LocalHBaseCluster shutdown if more than one regionserver is started
[ https://issues.apache.org/jira/browse/HBASE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-1559. -- Resolution: Won't Fix We don't see this anymore. Reopen if it happens again. IllegalThreadStateException during LocalHBaseCluster shutdown if more than one regionserver is started -- Key: HBASE-1559 URL: https://issues.apache.org/jira/browse/HBASE-1559 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Priority: Minor IllegalThreadStateException during LocalHBaseCluster shutdown if more than one regionserver is started: {noformat} Thread [RegionServer:1] (Suspended (exception IllegalThreadStateException)) FileSystem$ClientFinalizer(Thread).start() line: 595 HRegionServer.runThread(Thread,long) line: 691 HRegionServer.run() line: 675 LocalHBaseCluster$RegionServerThread(Thread).run() line: 691 {noformat} If started with only one region server, shut down is clean. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1109) Explore the possibility of storing the configuration files in Zookeeper
[ https://issues.apache.org/jira/browse/HBASE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-1109. -- Resolution: Won't Fix This is duplicate of HBASE-3909 I'd say; also its an axiom of ours that we not put permanent data into zk which this issue would seem to imply Explore the possibility of storing the configuration files in Zookeeper --- Key: HBASE-1109 URL: https://issues.apache.org/jira/browse/HBASE-1109 Project: HBase Issue Type: New Feature Reporter: Jean-Daniel Cryans Priority: Minor Someone on IRC was saying that Google uses Chubby to store their configuration files. We should explore that solution with ZK. It has big benefits IMO. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1213) [performance] Investigate Locking Contention in the Write Path
[ https://issues.apache.org/jira/browse/HBASE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-1213. -- Resolution: Duplicate Resolving as duplicate of the WAL batching work that has been done of late; this issue talks about batching going into WAL [performance] Investigate Locking Contention in the Write Path Key: HBASE-1213 URL: https://issues.apache.org/jira/browse/HBASE-1213 Project: HBase Issue Type: Improvement Components: performance Affects Versions: 0.19.0 Reporter: Ben Maurer Assignee: stack When doing a large number of bulk updates from different clients, I noticed that there was a high level of lock contention for stuff like locking the HLog. It seems that each thread acquires the lock for a single BatchUpdate, releases the lock then another thread owns the lock before the initial writer gets to the next update. Having the threads bounce around may lead to suboptimal performance. Should be benchmarked maybe changed to have less context switching. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215861#comment-13215861 ] Devaraj Das commented on HBASE-5451: Thanks Stack, for the quick but detailed review... bq. if (!head.hasUserInfo()) return; bq. .. Then you'd save an indent of the whole body of the method. Makes sense bq. Seems like ticket should be renamed user (we seem to be creating a user rather than a ticket?) here – I like the way you ask user to create passing the header: Makes sense bq. Is ConnectionContext actually the headers? Should it be called ConnectionHeader? Ok bq. What is this – HBaseCompleteRpcRequestProto? Its 'The complete RPC request message'. Its the callid and the client request. Is it the complete request because its missing the header? Should it just be called Request since its inside a package that makes its provinence clear? I suppose request would be odd because you then do getRequest on it... hmm. The CompleteRPCRequest message is composed of the RPC callID and the application RPC message (currently either a Writable or a PB). I wanted to distinguish between the two, but let me look at renaming .. bq. Why tunnelRequest. Whats that mean? Currently, the RPC client only works with Writables. We will need to tunnel Writable RPC messages until we have PB for all the app layer protocols. Kindly have a look at the client side where the writable RPC message is serialized for sending it to the server. bq. Fatten doc on the proto file I'd say. Its going to be our spec. Ok bq. Can these proto classes drop the HBaseRPC prefix? Is the Proto suffix going to be our convention denoting Proto classes going forward? Will drop the prefix. But I guess the suffix should stay.. bq. Are we doing to repeat the hrpc exception handling carrying Strings for exceptions from server to client? Haven't done anything on this one yet. Let me see (this could be a separate jira IMO). Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Devaraj Das Attachments: rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-5351: -- Attachment: HBASE-5351-v1.patch *attached HBASE-5351-v1.patch @Adrian and stack: Agreed, I was just trying to make a minimal change. New patch as suggested. hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215885#comment-13215885 ] Adrian Muraru commented on HBASE-5351: -- Great, what about try/catch java.net.SocketTimeoutException. Don't think is needed anymore when sync createTable is used. Let's let any exception thrown by createTable() call bubble up. What do you say? hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215887#comment-13215887 ] Hadoop QA commented on HBASE-5351: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515962/HBASE-5351-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -131 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1044//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1044//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1044//console This message is automatically generated. hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName())
[jira] [Resolved] (HBASE-2073) IllegalArgumentException causing regionserver failure
[ https://issues.apache.org/jira/browse/HBASE-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2073. -- Resolution: Won't Fix Not enough detail and I don't think we've seen this lately IllegalArgumentException causing regionserver failure - Key: HBASE-2073 URL: https://issues.apache.org/jira/browse/HBASE-2073 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.2 Environment: Ubuntu 8.10, Java 1.6.0_10, HBase 0.20.2 Reporter: Greg Lu Priority: Minor Attachments: hbase-hadoop-regionserver-factory05.lab.mtl.log After a regionserver went down last night, I checked its logs and found the following exception: 2009-12-29 00:17:27,663 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll /hbase/amsterdam_factory/.logs/factory05.lab.mtl,60020,1262042255724/hlog.dat.1262060247637, entries=1830, calcsize=22946017, filesize=22758899. New hlog /hbase/amsterdam_factory/.logs/factory05.lab.mtl,60020,1262042255724/hlog.dat.1262063847659 2009-12-29 00:34:36,210 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java:1114) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:58) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:79) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:189) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2009-12-29 00:34:36,214 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020, call next(4170645244799815171, 1) from 192.168.1.108:53401: error: java.io.IOException: java.lang.IllegalArgumentException java.io.IOException: java.lang.IllegalArgumentException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1965) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java:1114) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:58) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:79) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:189) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944) ... 5 more Looks like this bug was encountered before at https://issues.apache.org/jira/browse/HBASE-1495 and spanned a few JIRAs. It's supposed to be resolved as of 0.20.0, but we're running 0.20.2 and it took down one of our regionservers. I'm also attaching more of the log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215895#comment-13215895 ] stack commented on HBASE-5451: -- Ok on the tunnel thing. Maybe comment it some more (if you haven't already) in code. Yeah on suffix. We need convention I'd say distingushing the PB classes. On exception, could do as separate jira. Here is one that looks like its what you need that already exists, if it helps: HBASE-2030 Switch RPC call envelope/headers to PBs --- Key: HBASE-5451 URL: https://issues.apache.org/jira/browse/HBASE-5451 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Devaraj Das Attachments: rpc-proto.patch.1_2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-2142) Add number of RegionServers (live/dead) to JMX metrics in HMaster
[ https://issues.apache.org/jira/browse/HBASE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2142. -- Resolution: Duplicate Marking duplicate of hbase-5325 which does better than this region asks for giving you actual names of live and dead servers: {code} + /** + * Get the live region servers + * @return Live region servers + */ + public MapString, HServerLoad getRegionServers(); + + /** + * Get the dead region servers + * @return Dead region Servers + */ + public String[] getDeadRegionServers(); {code} Add number of RegionServers (live/dead) to JMX metrics in HMaster - Key: HBASE-2142 URL: https://issues.apache.org/jira/browse/HBASE-2142 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.20.2 Reporter: Lars George Priority: Minor While commenting on HBASE-2117 I noticed that Hadoop's NameNode has that and it makes sense to expose it too in HBase's HMaster metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5325: - Resolution: Fixed Fix Version/s: 0.92.1 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed branch and trunk. Thanks for the nice patch and being accomodating of feedback Hitesth. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-5351: -- Attachment: HBASE-5351-v2.patch *attached HBASE-5351-v2.patch* You are quite right -- createTable catches the SocketTimeoutException anyway. hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1762) Remove concept of ZooKeeper from HConnection interface
[ https://issues.apache.org/jira/browse/HBASE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215946#comment-13215946 ] stack commented on HBASE-1762: -- This is being done as part of HBASE-5399 Remove concept of ZooKeeper from HConnection interface -- Key: HBASE-1762 URL: https://issues.apache.org/jira/browse/HBASE-1762 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.0 Reporter: Ken Weiner Assignee: stack Attachments: HBASE-1762.patch The concept of ZooKeeper is really an implementation detail and should not be exposed in the {{HConnection}} interface. Therefore, I suggest removing the {{HConnection.getZooKeeperWrapper()}} method from the interface. I couldn't find any uses of this method within the HBase code base except for in one of the unit tests: {{org.apache.hadoop.hbase.TestZooKeeper}}. This unit test should be changed to instantiate the implementation of {{HConnection}} directly, allowing it to use the {{getZooKeeperWrapper()}} method. This requires making {{org.apache.hadoop.hbase.client.HConnectionManager.TableServers}} public. (I actually think TableServers should be moved out into an outer class, but in the spirit of small patches, I'll refrain from suggesting that in this issue). I'll attach a patch for: # The removal of {{HConnection.getZooKeeperWrapper()}} # Change of {{TableServers}} class from private to public # Direct instantiation of {{TableServers}} within {{TestZooKeeper}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble
[ https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215945#comment-13215945 ] stack commented on HBASE-5399: -- Another thought: Do we have to have the getSharedZookeeperWatcher and releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection API? Are these not implementation details? (Or would it be too hard to undo them -- you'd have not way of counting zk and master connections?) Cut the link between the client and the zookeeper ensemble -- Key: HBASE-5399 URL: https://issues.apache.org/jira/browse/HBASE-5399 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 5399_inprogress.v9.patch The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection. There are choices to be made considering the existing API (that we don't want to break). The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken. ZooKeeper is used for: - public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection = we have to deprecate this but keep it. - read get master address to create a master = now done with a temporary zookeeper connection - read root location = now done with a temporary zookeeper connection, but questionable. Used in public function locateRegion. To be reworked. - read cluster id = now done once with a temporary zookeeper connection. - check if base done is available = now done once with a zookeeper connection given as a parameter - isTableDisabled/isTableAvailable = public functions, now done with a temporary zookeeper connection. - Called internally from HBaseAdmin and HTable - getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread = now done with a temporary zookeeper connection - Master is used for: - getMaster public getter, as for ZooKeeper = we have to deprecate this but keep it. - isMasterRunning(): public function, used internally by HMerge HBaseAdmin - getHTableDescriptor*: public functions offering access to the master. = we could make them using a temporary master connection as well. Main points are: - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture ;-). This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow. - having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client - if we move the table descriptor part away from the client, we need to find a new place for it. - we will have the same issue if HBaseAdmin (for both ZK Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4932) Block cache can be mistakenly instantiated by tools
[ https://issues.apache.org/jira/browse/HBASE-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215947#comment-13215947 ] Prakash Khemani commented on HBASE-4932: Yes ... It is a good to have patch. Thanks. On 2/24/12 12:03 PM, Mikhail Bautin (Commented) (JIRA) j...@apache.org Block cache can be mistakenly instantiated by tools --- Key: HBASE-4932 URL: https://issues.apache.org/jira/browse/HBASE-4932 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani Fix For: 0.94.0 Attachments: HBASE-4932.patch Map Reduce tasks that create a writer to write HFiles inadvertently end up creating block cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-2310) Review how hbase does addresses throughout including in logs, ui and in code
[ https://issues.apache.org/jira/browse/HBASE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2310. -- Resolution: Later Resolving as later. Its a silly general task that just won't get done. Review how hbase does addresses throughout including in logs, ui and in code Key: HBASE-2310 URL: https://issues.apache.org/jira/browse/HBASE-2310 Project: HBase Issue Type: Task Reporter: stack HBASE-2174 fixed the issue where we were doing dns lookup on each heartbeat and it adds into .META. table hostname rather than IP. This issue takes over from hbase-2174 to make it so we run through all of hbase making sure we are consistent in our use of hostname rather than IP everywhere. See HBASE-2174 for other background that'll help with this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215973#comment-13215973 ] stack commented on HBASE-5075: -- This issue seems to be like 'HBASE-2342 Consider adding a watchdog node next to region server' regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-2351) publish hadoop + patch artifacts under org.apache.hbase groupId
[ https://issues.apache.org/jira/browse/HBASE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2351. -- Resolution: Won't Fix This issue no longer applicable now we run against published hadoops w/o need of patches. publish hadoop + patch artifacts under org.apache.hbase groupId Key: HBASE-2351 URL: https://issues.apache.org/jira/browse/HBASE-2351 Project: HBase Issue Type: Sub-task Components: build Reporter: Karthik K Similarly, the trunk of hbase , currently depends on a couple of patches on top of hadoop 0.20.2 release, that is being actively worked on at HBASE-2255 . Once that experience succeeds , before the 0.21.0 release, the artifacts need to be published under groupId - org.apache.hbase and artifactId - hadoop-p1-p2-p3' , say. ( where p1,p2 and p3 are patch numbers, say). The final pom.xml of hbase should be devoid of external references for better maintainability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215981#comment-13215981 ] Jesse Yates commented on HBASE-5075: Yeah, very similar. Same issues what that ticket as before, namely wanting to keep HBase as simple and minimal as we can justify. regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-2675) Quick smoke tests testsuite
[ https://issues.apache.org/jira/browse/HBASE-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2675. -- Resolution: Fixed Resolving as fixed by the default run of small tests (reopen if not sufficient in your estimation B) Quick smoke tests testsuite - Key: HBASE-2675 URL: https://issues.apache.org/jira/browse/HBASE-2675 Project: HBase Issue Type: Test Reporter: Benoit Sigoure Assignee: nkeywal Priority: Minor It would be nice if there was a known subset of the tests that run fast (e.g. not more than a few seconds) and quickly help us check whether the code isn't horribly broken. This way one could run those tests at a frequent interval when iterating and only run the entire testsuite at the end, when they think they're done, since doing so is very time consuming. Someone would need to identify which tests really focus on the core functionality and add a target in the build system to just run those tests. As a bonus, it would be awesome++ if the core tests ran, say, 10x faster than they currently do. There's a lot of sleep-based synchronization in the tests and it would be nice to remove some of that where possible to make the tests run as fast as the machine can handle them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5351: - Status: Open (was: Patch Available) hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5351: - Status: Patch Available (was: Open) hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5183) Render the monitored tasks as a treeview
[ https://issues.apache.org/jira/browse/HBASE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215991#comment-13215991 ] Mubarak Seyed commented on HBASE-5183: -- I believe we need to present the monitored task as TreeView + Table (treeTable). The first column is a tree with root/node/leaf, 2nd to 5th shows startTime, description, state and status. Something like http://ludo.cubicphuse.nl/jquery-plugins/treeTable/doc/ How do we group data? Option 1: Group-by StartTime {code} Start time | Description | State| Status + Mon Feb 20 15:10:08 PST 2012 IPC Server handler 99 on 6 WAITING (since 4mins, 8sec ago) Waiting for a call (since ..) IPC Server handler 20 on 6 WAITING (since 2mins, 1sec ago) Waiting for a call (since ..) + Mon Feb 22 17:18:18 PST 2012 IPC Server handler 40 on 6 WAITING (since 0mins, 30sec ago) Waiting for a call (since ..) {code} Option 2: Group-by State Option 3: Group-by Status I believe StartTime is almost same for all the IPC server handler as Master or RS pages shows the same startTime (depends on when we restart daemons). Option 2 and 3 are more of a text based grouping. Thoughts? Render the monitored tasks as a treeview Key: HBASE-5183 URL: https://issues.apache.org/jira/browse/HBASE-5183 Project: HBase Issue Type: Sub-task Reporter: Zhihong Yu Assignee: Mubarak Seyed Fix For: 0.92.2, 0.94.0 Andy made the suggestion here: https://issues.apache.org/jira/browse/HBASE-5174?focusedCommentId=13184571page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13184571 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5455: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for idea and patch! Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Assignee: Michael Drzal Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5455.diff HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5351: - Attachment: HBASE-5351-v2.patch Uploading same patch so can redo submit patch to hadoopqa hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, HBASE-5351-v2.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5351: - Status: Patch Available (was: Open) hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, HBASE-5351-v2.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +while (!this.hbAdmin.isTableAvailable(tableName) (ctrTABLE_CREATE_MAX_R {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216009#comment-13216009 ] Kannan Muthukkaruppan commented on HBASE-5416: -- +1 to what Mikhail said. Max--- This is an interesting use case. I will take a closer look at the changes. But, if it is indeed the case that the set of rows you need to lookup in the second CF is a small % of the total data in that CF, then issuing subsequent gets (point lookups) for the relevant keys in that CF should work reasonably well, correct? BTW, are you doing this using HTableInputFormat? Perhaps you can detail the structure of your MR job more, and we can work through some specific options. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216019#comment-13216019 ] stack commented on HBASE-4991: -- bq. I feel some of the recent proposals / requirements are far more complex than the one Yeah. It seemed basic back in December. bq. There wasn't such requirement when Mubarak outlined his plan Pardon me. I should have noticed the plan but did not. Other priorities. If I'd seen the plan I'd have blanched I think. bq. Of course, having generic framework for all the master-coordinated tasks allows future additions to be concise. Yep. We'd have tested, proven primitives to build stuff on rather than have to do it per feature bq. But I think that should have been outlined clearly in the early stage of development of a feature. See above. Pardon me for missing how involved this addition became. I don't see how plan of ' 01/Feb/12 07:43' lays foundation for a generic framework. Am I missing something? It seems like its code for this feature only? Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5350) Fix jamon generated package names
[ https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5350: - Attachment: jamon_HBASE-5350.patch Reattach so can redo hadoopqa Fix jamon generated package names - Key: HBASE-5350 URL: https://issues.apache.org/jira/browse/HBASE-5350 Project: HBase Issue Type: Bug Components: monitoring Affects Versions: 0.92.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.0 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch Previously, jamon was creating the template files in org.apache.hbase, but it should be org.apache.hadoop.hbase, so it's in line with rest of source files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5350) Fix jamon generated package names
[ https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5350: - Status: Patch Available (was: Open) Fix jamon generated package names - Key: HBASE-5350 URL: https://issues.apache.org/jira/browse/HBASE-5350 Project: HBase Issue Type: Bug Components: monitoring Affects Versions: 0.92.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.0 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch Previously, jamon was creating the template files in org.apache.hbase, but it should be org.apache.hadoop.hbase, so it's in line with rest of source files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5350) Fix jamon generated package names
[ https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5350: - Status: Open (was: Patch Available) Fix jamon generated package names - Key: HBASE-5350 URL: https://issues.apache.org/jira/browse/HBASE-5350 Project: HBase Issue Type: Bug Components: monitoring Affects Versions: 0.92.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.94.0 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch Previously, jamon was creating the template files in org.apache.hbase, but it should be org.apache.hadoop.hbase, so it's in line with rest of source files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5473) Metrics does not push pread time
Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Priority: Minor The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5473: Assignee: dhruba borthakur Status: Patch Available (was: Open) Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5442) Use builder pattern in StoreFile and HFile
[ https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216044#comment-13216044 ] Phabricator commented on HBASE-5442: mbautin has commented on the revision [jira] [HBASE-5442] [89-fb] Use builder pattern in StoreFile and HFile. This passed all unit tests. REVISION DETAIL https://reviews.facebook.net/D1941 Use builder pattern in StoreFile and HFile -- Key: HBASE-5442 URL: https://issues.apache.org/jira/browse/HBASE-5442 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.94.0 Attachments: D1893.1.patch, D1893.2.patch, D1941.1.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses StoreFile and HFile refactoring. For HColumnDescriptor refactoring see HBASE-5357. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5473: --- Attachment: D1947.1.patch dhruba requested code review of [jira] [HBASE-5473] Metrics does not push pread time. Reviewers: sc, tedyu Metrics does not push pread time. TEST PLAN All unit tests pass. Also deployed on my local test cluster. REVISION DETAIL https://reviews.facebook.net/D1947 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/4119/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5473: --- Attachment: D1947.1.patch dhruba requested code review of [jira] [HBASE-5473] Metrics does not push pread time. Reviewers: sc, tedyu Metrics does not push pread time. TEST PLAN All unit tests pass. Also deployed on my local test cluster. REVISION DETAIL https://reviews.facebook.net/D1947 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/4119/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5473: --- Attachment: D1947.1.patch dhruba requested code review of [jira] [HBASE-5473] Metrics does not push pread time. Reviewers: sc, tedyu Metrics does not push pread time. TEST PLAN All unit tests pass. Also deployed on my local test cluster. REVISION DETAIL https://reviews.facebook.net/D1947 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/4119/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216053#comment-13216053 ] Phabricator commented on HBASE-5357: Kannan has commented on the revision [jira] [HBASE-5357] [89-fb] Refactoring: use the builder pattern for HColumnDescriptor. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:486 INTEGER.MAX_VALUE seems to be the block size. Not sure why it was this in the past. But the new behavior, you are defaulting back to the default (64k). src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:544 ditto. REVISION DETAIL https://reviews.facebook.net/D1929 Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, D1851.4.patch, D1929.1.patch, D1929.2.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5474) Shared the multiput thread pool for all the HTable instance
Shared the multiput thread pool for all the HTable instance --- Key: HBASE-5474 URL: https://issues.apache.org/jira/browse/HBASE-5474 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a unbounded cached thread pool. So it would increase the efficiency if HTable could share this unbounded cached thread pool across all the HTable instance ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216059#comment-13216059 ] Phabricator commented on HBASE-5357: mbautin has commented on the revision [jira] [HBASE-5357] [89-fb] Refactoring: use the builder pattern for HColumnDescriptor. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:486 Yes, that block size looked like a bug to me, so I left it out. If someone explains to me why that was reasonable, I would be happy to add it back. REVISION DETAIL https://reviews.facebook.net/D1929 Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, D1851.4.patch, D1929.1.patch, D1929.2.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile
[ https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5442: --- Attachment: D1941.2.patch mbautin updated the revision [jira] [HBASE-5442] [89-fb] Use builder pattern in StoreFile and HFile. Reviewers: JIRA, khemani, Kannan, Liyin, Karthik, nspiegelberg Removing irrelevant javadoc from StoreFile writer constructor REVISION DETAIL https://reviews.facebook.net/D1941 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/util/CompressionTest.java src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFilePerformance.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java Use builder pattern in StoreFile and HFile -- Key: HBASE-5442 URL: https://issues.apache.org/jira/browse/HBASE-5442 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.94.0 Attachments: D1893.1.patch, D1893.2.patch, D1941.1.patch, D1941.2.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses StoreFile and HFile refactoring. For HColumnDescriptor refactoring see HBASE-5357. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216066#comment-13216066 ] stack commented on HBASE-5075: -- Rather than write a new supervisor, why not use something old school like http://supervisord.org/ A wrapper script could clear old znode from zk before restarting new RS instance? regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216065#comment-13216065 ] Hadoop QA commented on HBASE-5441: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515639/HBASE-5441.D1857.4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1047//console This message is automatically generated. HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch, HBASE-5441.D1857.4.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216069#comment-13216069 ] Hudson commented on HBASE-5325: --- Integrated in HBase-0.92 #303 (See [https://builds.apache.org/job/HBase-0.92/303/]) HBASE-5325 Expose basic information about the master-status through jmx beans (Revision 1293417) Result = SUCCESS stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MXBean.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/MXBean.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/MXBeanImpl.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMXBean.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMXBean.java Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.92.1, 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5474) Shared the multiput thread pool for all the HTable instance
[ https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5474: -- Description: Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. was:Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a unbounded cached thread pool. So it would increase the efficiency if HTable could share this unbounded cached thread pool across all the HTable instance ? Shared the multiput thread pool for all the HTable instance --- Key: HBASE-5474 URL: https://issues.apache.org/jira/browse/HBASE-5474 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5075) regionserver crashed and failover
[ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216078#comment-13216078 ] stack commented on HBASE-5075: -- Looking in HRegionServer code, it looks like we delete our znode on the way out already. Someone had your idea already Jesse: {code} try { deleteMyEphemeralNode(); } catch (KeeperException e) { LOG.warn(Failed deleting my ephemeral node, e); } {code} Maybe this is broke? regionserver crashed and failover - Key: HBASE-5075 URL: https://issues.apache.org/jira/browse/HBASE-5075 Project: HBase Issue Type: Improvement Components: monitoring, regionserver, replication, zookeeper Affects Versions: 0.92.1 Reporter: zhiyuan.dai Fix For: 0.90.5 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's shutdown,it is long time to fetch the hlog's lease. hbase is a online db, availability is very important. i have a idea to improve availability, monitor node to check regionserver's pid.if this pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file. so the period maybe 100ms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216077#comment-13216077 ] Phabricator commented on HBASE-5473: sc has commented on the revision [jira] [HBASE-5473] Metrics does not push pread time. looks good to me REVISION DETAIL https://reviews.facebook.net/D1947 Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216075#comment-13216075 ] Phabricator commented on HBASE-5473: sc has commented on the revision [jira] [HBASE-5473] Metrics does not push pread time. looks good to me REVISION DETAIL https://reviews.facebook.net/D1947 Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216076#comment-13216076 ] Phabricator commented on HBASE-5473: sc has commented on the revision [jira] [HBASE-5473] Metrics does not push pread time. looks good to me REVISION DETAIL https://reviews.facebook.net/D1947 Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216082#comment-13216082 ] Phabricator commented on HBASE-5357: stack has commented on the revision [jira] [HBASE-5357] [89-fb] Refactoring: use the builder pattern for HColumnDescriptor. Sounds like the old stuff was wrong. REVISION DETAIL https://reviews.facebook.net/D1929 Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, D1851.4.patch, D1929.1.patch, D1929.2.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216084#comment-13216084 ] Phabricator commented on HBASE-5473: stack has accepted the revision [jira] [HBASE-5473] Metrics does not push pread time. +1 REVISION DETAIL https://reviews.facebook.net/D1947 BRANCH svn Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216086#comment-13216086 ] Phabricator commented on HBASE-5473: stack has accepted the revision [jira] [HBASE-5473] Metrics does not push pread time. +1 REVISION DETAIL https://reviews.facebook.net/D1947 BRANCH svn Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216085#comment-13216085 ] Phabricator commented on HBASE-5473: stack has accepted the revision [jira] [HBASE-5473] Metrics does not push pread time. +1 REVISION DETAIL https://reviews.facebook.net/D1947 BRANCH svn Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216087#comment-13216087 ] Phabricator commented on HBASE-5473: stack has accepted the revision [jira] [HBASE-5473] Metrics does not push pread time. +1 REVISION DETAIL https://reviews.facebook.net/D1947 BRANCH svn Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216089#comment-13216089 ] Lars Hofhansl commented on HBASE-4991: -- Maybe we should separate this feature from a generic framework? For this issue we could just have one API: deleteRange(table, startKey, endKey). Initially it could validate that the start and endKey coincide with exactly one region, that way we can extend this later, without having regions exposed in the API. (still need to avoid races with splitting and balancing of course - makes it almost nicer to go back to the original approach of passing a region name). Just my $0.02. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race
[ https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216093#comment-13216093 ] Hadoop QA commented on HBASE-5351: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515977/HBASE-5351-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -131 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1046//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1046//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1046//console This message is automatically generated. hbase completebulkload to a new table fails in a race - Key: HBASE-5351 URL: https://issues.apache.org/jira/browse/HBASE-5351 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, HBASE-5351-v2.patch, HBASE-5351.patch I have a test that tests vanilla use of importtsv with importtsv.bulk.output option followed by completebulkload to a new table. This sometimes fails as follows: 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: ml_items_copy, row=ml_items_copy,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707) The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then creating an HTable object before that call has actually completed. The following change to /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java appears to fix the problem, but I have not been able to reproduce the race reliably, in order to write a test. {code} -HTable table = new HTable(this.cfg, tableName); - -HConnection conn = table.getConnection(); int ctr = 0; -while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MA +
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216095#comment-13216095 ] Jimmy Xiang commented on HBASE-3909: Can we put dynamic configuration somewhere in the HDFS, for example, some file under hbase.rootdir? We can put static configuration in hbase-site.xml, and dynamic configuration in a file under hbase.rootdir. We can also enhance hbase shell or master UI to view/change those dynamic configurations. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.94.0 I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira