[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215477#comment-13215477
 ] 

Hadoop QA commented on HBASE-5317:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12515901/HBASE-5317to0.92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1040//console

This message is automatically generated.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, 
 HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, 
 HBASE-5317-v6.patch, HBASE-5317to0.92.patch, 
 TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-24 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215482#comment-13215482
 ] 

Mubarak Seyed commented on HBASE-4991:
--

bq. How does Accumulo do it do you know? You might get some ideas over there.
Will take a look. Todd's preso highlights comparison of HBase vs Accumulo - 
http://www.slideshare.net/cloudera/h-base-and-accumulo-todd-lipcom-jan-25-2012

Source:
https://svn.apache.org/repos/asf/incubator/accumulo/branches/1.4/src/server/src/main/java/org/apache/accumulo/server/fate/
(Master-coordinated tasks uses Fate, refer TStore.java) 
https://svn.apache.org/repos/asf/incubator/accumulo/branches/1.4/src/server/src/main/java/org/apache/accumulo/server/master/Master.java

Notes from TStore.java
{code}
/**
 * Transaction Store: a place to save transactions
 * 
 * A transaction consists of a number of operations. To use, first create a 
transaction id, and then seed the
 * transaction with an initial operation. An executor service can then execute 
the transaction's operation,
 * possibly pushing more operations onto the transaction as each step 
successfully completes.
 * If a step fails, the stack can be unwound, undoing each operation.
 */
{code}

For example, delete-range operation in master uses fate to seed transaction 
with an DELETE_RANGE table operation, submit a task, executor service can then 
execute the op.

{code}
public void executeTableOperation(TInfo tinfo, AuthInfo c, long opid, 
org.apache.accumulo.core.master.thrift.TableOperation op, ListByteBuffer 
arguments, MapString,String options, boolean autoCleanup){

case DELETE_RANGE: {
  String tableName = ByteBufferUtil.toString(arguments.get(0));
  Text startRow = ByteBufferUtil.toText(arguments.get(1));
  Text endRow = ByteBufferUtil.toText(arguments.get(2));
  
  final String tableId = checkTableId(tableName, 
TableOperation.DELETE_RANGE);
  checkNotMetadataTable(tableName, TableOperation.DELETE_RANGE);
  verify(c, tableId, TableOperation.DELETE_RANGE, check(c, 
SystemPermission.SYSTEM) || check(c, tableId, TablePermission.WRITE));
  
  fate.seedTransaction(opid, new TraceRepoMaster(new 
TableRangeOp(MergeInfo.Operation.DELETE, tableId, startRow, endRow)), 
autoCleanup);
  break;
}
}
{code}

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v3.patch

Fixed all failed tests, added test for joined scanners functionality.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Status: Patch Available  (was: Open)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215502#comment-13215502
 ] 

Zhihong Yu commented on HBASE-5416:
---

{code}
+  KeyValue nextKV = this.joinedHeap.peek();
+  while (true) {
+this.joinedHeap.next(results, limit - results.size());
+nextKV = this.joinedHeap.peek();
{code}
I think the first peek() isn't needed because there is another peek() inside 
the loop.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215504#comment-13215504
 ] 

Max Lapan commented on HBASE-5416:
--

@Thomas:

Yes, this is the primary goal of this patch. When CF_B is large, we'll load 
only needed blocks from it (via seek), which could give a huge speedup in scan.

@Zhihong:

Thanks, I'll fix this, now waiting to jenkins results.
Didn't know about reviews.apache.org, thanks. I'll post there, of couse :).

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215519#comment-13215519
 ] 

Hadoop QA commented on HBASE-5416:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12515904/Filtered_scans_v3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -133 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 155 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1041//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1041//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1041//console

This message is automatically generated.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215521#comment-13215521
 ] 

Zhihong Yu commented on HBASE-5416:
---

The following line is too long:
{code}
+if (this.joinedHeap != null  
this.joinedHeap.seek(KeyValue.createFirstOnRow(currentRow))) {
{code}
Please limit to 80 chars per line.

You can get Eclipse formatter from HBASE-3678.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Status: Open  (was: Patch Available)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v4.patch

Fixed comment, removed extra peek() call and folded long line.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Status: Patch Available  (was: Open)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215529#comment-13215529
 ] 

Max Lapan commented on HBASE-5416:
--

@Zhihong: Have trouble with post new review request - it gives 500 error. Maybe 
this is related with apache jira issues, will try later.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215550#comment-13215550
 ] 

Max Lapan commented on HBASE-5416:
--

@stack:
Documentation paragraph to include. I think it should go there: 
http://hbase.apache.org/book.html#number.of.cfs
{quote}
There is a performance option to keep in mind on schema design. In some situations, two (or more) columns family schema could be much faster than a single-CF design. It could be the case when you have one column which is used to sieve larger rows from other columns. If SingleColumnValueFilter or SingleColumnValueExcludeFilter is used to find the needed rows, only a small column is scanned, other columns are  loaded only when matching row has been found. This could reduce the amount of data loaded significantly and lead to faster scans.
{quote}

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Status: Open  (was: Patch Available)

There is still a mistake somewhere, our stats scan return different results.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: Filtered_scans.patch, Filtered_scans_v2.patch, 
 Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-24 Thread Michael Drzal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-5455:
-

Status: Open  (was: Patch Available)

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Priority: Minor
 Fix For: 0.94.0


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-24 Thread Michael Drzal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-5455:
-

Status: Patch Available  (was: Open)

Added a test case for class to int mapping in HbaseObjectWritable to ensure 
wire compatibility.

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Priority: Minor
 Fix For: 0.94.0


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-24 Thread Michael Drzal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Drzal updated HBASE-5455:
-

Attachment: HBASE-5455.diff

Updated TestHbaseObjectWritable to error on class code changes that would 
affect the wire protocol.

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5455.diff


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5416:
--

Attachment: 5416-v5.txt

Patch v5 is based on v4, with grammatical corrections.

@Max:
What do you think ?

@Override is missing for isFamilyEssential() in a few files.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5416:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5416:
--

Status: Open  (was: Patch Available)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5416:
--

Attachment: 5416-v6.txt

Same as patch v5.
I verified that patch v6 can be used to generate new review request.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-24 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215703#comment-13215703
 ] 

Zhihong Yu commented on HBASE-5317:
---

Integrated to 0.92 branch.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, 
 HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, 
 HBASE-5317-v6.patch, HBASE-5317to0.92.patch, 
 TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215720#comment-13215720
 ] 

Nicolas Spiegelberg commented on HBASE-5416:


Overall, I agree that this is a useful design pattern.  We use this pattern in 
our messages deployment and other production use cases as well.  I'm more 
concerned about this being in the critical path.  This is deep in the core 
logic, which has a lot of complicated usage and is extremely bug-prone (even 
after extensive unit tests).

If you don't need atomicity, then you don't get much benefit from solving this 
in the critical path.  The change introduces a lot of risk and design decisions 
that we have to worry about years later.  It might be some work to understand 
how to use a batch factor; but don't you think it would take more work to 
understand the variety of use cases for scans to ensure that we don't introduce 
side effects and make a scalable architectural decision?

At the very least, we should get a scan expert to look at this code before 
committing.  I'm not one, but I know this isn't the same as making a business 
logic change.  I just have one question about the patch right now:  Should we 
have unit tests case for ensuring the interop between this feature and 'limit'? 
 For example, ensure that joinedHeap is scanned before going to the next row if 
the storeHeap results.size() == limit


 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5437) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy

2012-02-24 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5437:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 HRegionThriftServer does not start because of a bug in 
 HbaseHandlerMetricsProxy
 ---

 Key: HBASE-5437
 URL: https://issues.apache.org/jira/browse/HBASE-5437
 Project: HBase
  Issue Type: Bug
  Components: metrics, thrift
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.94.0

 Attachments: HBASE-5437.D1857.1.patch, HBASE-5437.D1887.1.patch, 
 HBASE-5437.D1887.2.patch


 3.facebook.com,60020,1329865516120: Initialization of RS failed.  Hence 
 aborting RS.
 java.lang.ClassCastException: $Proxy9 cannot be cast to 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Iface
 at 
 org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.newInstance(HbaseHandlerMetricsProxy.java:47)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServerRunner.init(ThriftServerRunner.java:239)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658)
 at java.lang.Thread.run(Thread.java:662)
 2012-02-21 15:05:18,749 FATAL org.apache.hadoop.h

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215757#comment-13215757
 ] 

Lars Hofhansl commented on HBASE-5075:
--

Actually, the patches do not apply cleanly to HBase trunk.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread Adrian Muraru (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215759#comment-13215759
 ] 

Adrian Muraru commented on HBASE-5351:
--

Saw the same issue in 0.92 branch and trace it down to the same 
{noformat}this.hbAdmin.createTableAsync(htd, keys);{noformat}
and wondering why we wouldn't change this to:
{noformat}this.hbAdmin.createTable{noformat}
instead of looping and waiting for table to become available

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-02-24 Thread Himanshu Vashishtha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215769#comment-13215769
 ] 

Himanshu Vashishtha commented on HBASE-4348:


I have created a patch, which involves a new method in 
org.apache.hadoop.hbase.master.AssignmentManager and supporting code in 
src/main/jamon/org/apache/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon.
 
I am running it on my local system, and wonder about how to test this, i.e., to 
get some regions in RIT. Any suggestions please?

 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob

 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Max Lapan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215771#comment-13215771
 ] 

Max Lapan commented on HBASE-5416:
--

@Nicolas:
Still, have no idea how to resolve our slow scans problem different way. 
Two-phase rpc would be very inefficient in map-reduce job, when we need to 
issue lots of gets for each obtained 'flag' row and and have no good place to 
save them for multi-get (which could be huge in some cases). Batching also have 
little help there, because slowness not caused by a large Results, but tons of 
useless work, performed by a regionserver on such scans. Or, maybe, I missed 
something?

I agree that this solution is not elegant and complicates scan machinery, but 
all other approaches looks worse.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215784#comment-13215784
 ] 

Hudson commented on HBASE-5317:
---

Integrated in HBase-0.92 #302 (See 
[https://builds.apache.org/job/HBase-0.92/302/])
HBASE-5317  Fix TestHFileOutputFormat to work against hadoop 0.23
   (Gregory Taylor) (Revision 1293306)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java


 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, 
 HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, 
 HBASE-5317-v6.patch, HBASE-5317to0.92.patch, 
 TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215785#comment-13215785
 ] 

Jesse Yates commented on HBASE-5075:


Haven't had a chance to look at the latest patch yet, but have read through the 
docs. I have the same concern as Lars, namely,

bq. a bit worried about maintaining an additional process on every machine

What about doing something a bit simpler like adding a runtime shutdown hook to 
the RS such that the region server will update ZK or the master when it decides 
to bail out. Even something as simple as just removing your own znode on 
failure would be sufficient to cover this use case, correct?

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215789#comment-13215789
 ] 

Mikhail Bautin commented on HBASE-5416:
---

@Max: if you scan the 'flag' column family first, find the rows that you are 
interested in, and query only those rows from the 'snap' column family, you 
will avoid the slowness from scanning every row in 'snap'. With proper 
batching, the two-pass approach should work fine if you don't need atomicity.

The problem with such deep changes to the scanner framework is that it would 
require comprehensive new unit tests. The included unit test only writes three 
rows and does not really check the new feature (or the old functionality) on a 
large scale. Take a look at TestMultiColumnScanner and TestSeekOptimizations. 
We will need something at least as comprehensive as those tests for this 
improvement, probably even a multithreaded test case to ensure we don't break 
atomicity. If we do not do that testing now, we will still have to do it before 
the next stable release, but it would be unfair to pass the hidden costs of 
testing to those who don't need this particular optimization right now but will 
soon need a stable system for another production release.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215796#comment-13215796
 ] 

stack commented on HBASE-5075:
--

bq. Even something as simple as just removing your own znode on failure would 
be sufficient to cover this use case, correct?

Lets do that regardless.  Good idea.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-24 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5455:
-

Assignee: Michael Drzal
  Status: Patch Available  (was: Open)

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Assignee: Michael Drzal
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5455.diff


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size

2012-02-24 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215798#comment-13215798
 ] 

Jean-Daniel Cryans commented on HBASE-4365:
---

FWIW running a 5TB upload took 18h.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Critical
  Labels: usability
 Fix For: 0.94.0

 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 
 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215800#comment-13215800
 ] 

stack commented on HBASE-5351:
--

@Adrian That seems like the way to go.

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215802#comment-13215802
 ] 

stack commented on HBASE-5455:
--

+1 Excellent

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Assignee: Michael Drzal
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5455.diff


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4365) Add a decent heuristic for region size

2012-02-24 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215803#comment-13215803
 ] 

Jean-Daniel Cryans commented on HBASE-4365:
---

Oh and no concurrent mode failures, as I don't use dumb configurations. Also my 
ZK timeout is set to 20s.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Critical
  Labels: usability
 Fix For: 0.94.0

 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 
 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-59) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-59.


Resolution: Won't Fix

We are not going to  change this now I'd say (This issue is 4+ years old)

 Where hbase/mapreduce have analogous configuration parameters, they should be 
 named similarly
 -

 Key: HBASE-59
 URL: https://issues.apache.org/jira/browse/HBASE-59
 Project: HBase
  Issue Type: Improvement
  Components: mapred
Reporter: Michael Bieniosek
Priority: Trivial

 mapreduce has a configuration property called mapred.system.dir which 
 determines where in the DFS a jobtracker stores its data.  Similarly, hbase 
 has a configuration property called hbase.rootdir which does something very 
 similar.
 These should have the same name, eg. hbase.system.dir and 
 mapred.system.dir

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-587) Add auto-primary-key feature

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-587.
-

Resolution: Won't Fix

Doing as Harsh suggests. 

Hard to do this feature in a scalable way.

If wanted, we could do something like cassandra's time-based UUID to mint UUIDs 
that go in a chronological direction if that'd help.

 Add auto-primary-key feature
 

 Key: HBASE-587
 URL: https://issues.apache.org/jira/browse/HBASE-587
 Project: HBase
  Issue Type: New Feature
Reporter: Bryan Duxbury
Priority: Trivial

 Some folks seem to be interested in having their row keys automatically 
 generated in a unique fashion. Maybe we could do something like allow the 
 user to specify they want an automatic key, and then we'll generate a GUID 
 that's unique for that table and return it as part of the commit. Not sure 
 what the mechanics would look like exactly, but seems doable and it's going 
 to be a more prevalent use case as people start to put data into HBase first 
 without touching another system or pushing data without a natural unique 
 primary key.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-765) Adding basic Spring DI support to IndexConfiguration class.

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-765.
-

Resolution: Won't Fix

We don't have IndexConfiguration anymore in our code base.  Won't fix.

 Adding basic Spring DI support to IndexConfiguration class.
 ---

 Key: HBASE-765
 URL: https://issues.apache.org/jira/browse/HBASE-765
 Project: HBase
  Issue Type: Improvement
  Components: mapred
Affects Versions: 0.16.0, 0.1.0, 0.1.1, 0.1.2, 0.1.3
 Environment: n/a
Reporter: Ryan Smith
Priority: Minor
   Original Estimate: 20m
  Remaining Estimate: 20m

 Spring can configure classes/object graphs via xml.  I am pretty much able to 
 configure the entire MR object graph to launch MR jobs via spring except 
 class IndexConfiguration.java. So instead of only using addFromXML() to 
 configure IndexConfiguration, it would be nice to add support so Spring could 
 set all class variables needed for initialization in IndexConfiguration 
 without invoking addFromXML().  
 Since the class IndexConfiguration already has setters and getters for almost 
 all its members, it's almost compliant for a spring configuration bean except 
 one issue: no ability to configure columnMap outside of calling addFromXML(). 
  The easiest way i can figure is to allow a setter for the column map and put 
 any logic for checking the map integrity there.  By adding a few methods to 
 IndexConfiguration.java , it should solve the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-1012) [performance] Try doctoring a dfsclient so it shortcircuits hdfs when blocks are local

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1012.
--

Resolution: Duplicate

This is done, available in hdfs.

 [performance] Try doctoring a dfsclient so it shortcircuits hdfs when blocks 
 are local
 --

 Key: HBASE-1012
 URL: https://issues.apache.org/jira/browse/HBASE-1012
 Project: HBase
  Issue Type: Task
  Components: performance
Reporter: stack

 Ning Li up on list has stated that getting blocks using hdfs though the block 
 is local takes almost the same amount of time as accesing the block over the 
 network.  See if can do something smarter when the data is known to be local 
 short-circuiting hdfs if we can in a subclass of DFSClient (George Porter 
 suggestion).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-1339) NPE in HCM.procesRow called from master.jsp

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1339.
--

Resolution: Won't Fix

No longer pertinent.  We don't see this any more.

 NPE in HCM.procesRow called from master.jsp
 ---

 Key: HBASE-1339
 URL: https://issues.apache.org/jira/browse/HBASE-1339
 Project: HBase
  Issue Type: Bug
Reporter: stack

 {code}
 2009-04-22 02:10:34,710 WARN /: /master.jsp:
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers$1.processRow(HConnectionManager.java:344)
 at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:64)
 at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:29)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:351)
 at 
 org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:121)
 at 
 org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:121)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
 at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
 at org.mortbay.http.HttpServer.service(HttpServer.java:954)
 at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
 at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
 at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
 at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-1748) ClusterStatus needs to print out who has master role

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1748.
--

Resolution: Duplicate

Fixed by 'HBASE-5209 HConnection/HMasterInterface should allow for way to get 
hostname of currently active master in multi-master HBase setup'

 ClusterStatus needs to print out who has master role
 

 Key: HBASE-1748
 URL: https://issues.apache.org/jira/browse/HBASE-1748
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: stack
Priority: Trivial
 Attachments: HBASE-1748.patch


 Is in zk_dump but not in clusterstatus.
 You need it when you have 5 masters and you are trying to find the UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-1559) IllegalThreadStateException during LocalHBaseCluster shutdown if more than one regionserver is started

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1559.
--

Resolution: Won't Fix

We don't see this anymore.  Reopen if it happens again.

 IllegalThreadStateException during LocalHBaseCluster shutdown if more than 
 one regionserver is started
 --

 Key: HBASE-1559
 URL: https://issues.apache.org/jira/browse/HBASE-1559
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Priority: Minor

 IllegalThreadStateException during LocalHBaseCluster shutdown if more than 
 one regionserver is started:
 {noformat}
 Thread [RegionServer:1] (Suspended (exception IllegalThreadStateException))
 FileSystem$ClientFinalizer(Thread).start() line: 595
 HRegionServer.runThread(Thread,long) line: 691
 HRegionServer.run() line: 675
 LocalHBaseCluster$RegionServerThread(Thread).run() line: 691
 {noformat}
 If started with only one region server, shut down is clean.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-1109) Explore the possibility of storing the configuration files in Zookeeper

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1109.
--

Resolution: Won't Fix

This is duplicate of HBASE-3909 I'd say; also its an axiom of ours that we not 
put permanent data into zk which this issue would seem to imply

 Explore the possibility of storing the configuration files in Zookeeper
 ---

 Key: HBASE-1109
 URL: https://issues.apache.org/jira/browse/HBASE-1109
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
Priority: Minor

 Someone on IRC was saying that Google uses Chubby to store their 
 configuration files. We should explore that solution with ZK. It has big 
 benefits IMO.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-1213) [performance] Investigate Locking Contention in the Write Path

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-1213.
--

Resolution: Duplicate

Resolving as duplicate of the WAL batching work that has been done of late; 
this issue talks about batching going into WAL

 [performance] Investigate Locking  Contention in the Write Path
 

 Key: HBASE-1213
 URL: https://issues.apache.org/jira/browse/HBASE-1213
 Project: HBase
  Issue Type: Improvement
  Components: performance
Affects Versions: 0.19.0
Reporter: Ben Maurer
Assignee: stack

 When doing a large number of bulk updates from different clients, I noticed 
 that there was a high level of lock contention for stuff like locking the 
 HLog. It seems that each thread acquires the lock for a single BatchUpdate, 
 releases the lock then another thread owns the lock before the initial writer 
 gets to the next update. Having the threads bounce around may lead to 
 suboptimal performance.
 Should be benchmarked  maybe changed to have less context switching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-02-24 Thread Devaraj Das (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215861#comment-13215861
 ] 

Devaraj Das commented on HBASE-5451:


Thanks Stack, for the quick but detailed review...

bq. if (!head.hasUserInfo()) return;
bq. .. Then you'd save an indent of the whole body of the method.

Makes sense

bq. Seems like ticket should be renamed user (we seem to be creating a user 
rather than a ticket?) here – I like the way you ask user to create passing the 
header:

Makes sense

bq. Is ConnectionContext actually the headers? Should it be called 
ConnectionHeader?

Ok

bq. What is this – HBaseCompleteRpcRequestProto? Its 'The complete RPC request 
message'. Its the callid and the client request. Is it the complete request 
because its missing the header? Should it just be called Request since its 
inside a package that makes its provinence clear? I suppose request would be 
odd because you then do getRequest on it... hmm.

The CompleteRPCRequest message is composed of the RPC callID and the 
application RPC message (currently either a Writable or a PB). I wanted to 
distinguish between the two, but let me look at renaming ..

bq. Why tunnelRequest. Whats that mean?

Currently, the RPC client only works with Writables. We will need to tunnel 
Writable RPC messages until we have PB for all the app layer protocols. Kindly 
have a look at the client side where the writable RPC message is serialized for 
sending it to the server.

bq. Fatten doc on the proto file I'd say. Its going to be our spec.

Ok

bq. Can these proto classes drop the HBaseRPC prefix? Is the Proto suffix going 
to be our convention denoting Proto classes going forward?

Will drop the prefix. But I guess the suffix should stay..

bq. Are we doing to repeat the hrpc exception handling carrying Strings for 
exceptions from server to client?

Haven't done anything on this one yet. Let me see (this could be a separate 
jira IMO).

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Attachments: rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread Gregory Chanan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-5351:
--

Attachment: HBASE-5351-v1.patch

*attached HBASE-5351-v1.patch

@Adrian and stack:
Agreed, I was just trying to make a minimal change.

New patch as suggested.

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread Adrian Muraru (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215885#comment-13215885
 ] 

Adrian Muraru commented on HBASE-5351:
--

Great, what about try/catch java.net.SocketTimeoutException. Don't think is 
needed anymore when sync createTable is used. Let's let any exception thrown by 
createTable() call bubble up. 
What do you say?

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215887#comment-13215887
 ] 

Hadoop QA commented on HBASE-5351:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515962/HBASE-5351-v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -131 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 155 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1044//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1044//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1044//console

This message is automatically generated.

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 

[jira] [Resolved] (HBASE-2073) IllegalArgumentException causing regionserver failure

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2073.
--

Resolution: Won't Fix

Not enough detail and I don't think we've seen this lately

 IllegalArgumentException causing regionserver failure
 -

 Key: HBASE-2073
 URL: https://issues.apache.org/jira/browse/HBASE-2073
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.2
 Environment: Ubuntu 8.10, Java 1.6.0_10, HBase 0.20.2
Reporter: Greg Lu
Priority: Minor
 Attachments: hbase-hadoop-regionserver-factory05.lab.mtl.log


 After a regionserver went down last night, I checked its logs and found the 
 following exception:
 2009-12-29 00:17:27,663 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll 
 /hbase/amsterdam_factory/.logs/factory05.lab.mtl,60020,1262042255724/hlog.dat.1262060247637,
  entries=1830, calcsize=22946017, filesize=22758899. New hlog 
 /hbase/amsterdam_factory/.logs/factory05.lab.mtl,60020,1262042255724/hlog.dat.1262063847659
 2009-12-29 00:34:36,210 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 java.lang.IllegalArgumentException
   at java.nio.Buffer.position(Buffer.java:218)
   at 
 org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java:1114)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:58)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:79)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:189)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
   at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 2009-12-29 00:34:36,214 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 0 on 60020, call next(4170645244799815171, 1) from 
 192.168.1.108:53401: error: java.io.IOException: 
 java.lang.IllegalArgumentException
 java.io.IOException: java.lang.IllegalArgumentException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1965)
   at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 Caused by: java.lang.IllegalArgumentException
   at java.nio.Buffer.position(Buffer.java:218)
   at 
 org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.next(HFile.java:1114)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:58)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:79)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:189)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:106)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
   ... 5 more
 Looks like this bug was encountered before at 
 https://issues.apache.org/jira/browse/HBASE-1495 and spanned a few JIRAs. 
 It's supposed to be resolved as of 0.20.0, but we're running 0.20.2 and it 
 took down one of our regionservers.
 I'm also attaching more of the log.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215895#comment-13215895
 ] 

stack commented on HBASE-5451:
--

Ok on the tunnel thing.  Maybe comment it some more (if you haven't already) in 
code.

Yeah on suffix.  We need convention I'd say distingushing the PB classes.

On exception, could do as separate jira.  Here is one that looks like its what 
you need that already exists, if it helps: HBASE-2030

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Attachments: rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-2142) Add number of RegionServers (live/dead) to JMX metrics in HMaster

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2142.
--

Resolution: Duplicate

Marking duplicate of hbase-5325 which does better than this region asks for 
giving you actual names of live and dead servers:

{code}
+  /**
+   * Get the live region servers
+   * @return Live region servers
+   */
+  public MapString, HServerLoad getRegionServers();
+
+  /**
+   * Get the dead region servers
+   * @return Dead region Servers
+   */
+  public String[] getDeadRegionServers();
{code}

 Add number of RegionServers (live/dead) to JMX metrics in HMaster
 -

 Key: HBASE-2142
 URL: https://issues.apache.org/jira/browse/HBASE-2142
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.20.2
Reporter: Lars George
Priority: Minor

 While commenting on HBASE-2117 I noticed that Hadoop's NameNode has that and 
 it makes sense to expose it too in HBase's HMaster metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5325:
-

   Resolution: Fixed
Fix Version/s: 0.92.1
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed branch and trunk.  Thanks for the nice patch and being accomodating 
of feedback Hitesth.

 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread Gregory Chanan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-5351:
--

Attachment: HBASE-5351-v2.patch

*attached HBASE-5351-v2.patch*

You are quite right -- createTable catches the SocketTimeoutException anyway.

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, 
 HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1762) Remove concept of ZooKeeper from HConnection interface

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215946#comment-13215946
 ] 

stack commented on HBASE-1762:
--

This is being done as part of HBASE-5399

 Remove concept of ZooKeeper from HConnection interface
 --

 Key: HBASE-1762
 URL: https://issues.apache.org/jira/browse/HBASE-1762
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.0
Reporter: Ken Weiner
Assignee: stack
 Attachments: HBASE-1762.patch


 The concept of ZooKeeper is really an implementation detail and should not be 
 exposed in the {{HConnection}} interface.   Therefore, I suggest removing the 
 {{HConnection.getZooKeeperWrapper()}} method from the interface. 
 I couldn't find any uses of this method within the HBase code base except for 
 in one of the unit tests: {{org.apache.hadoop.hbase.TestZooKeeper}}.  This 
 unit test should be changed to instantiate the implementation of 
 {{HConnection}} directly, allowing it to use the {{getZooKeeperWrapper()}} 
 method.  This requires making 
 {{org.apache.hadoop.hbase.client.HConnectionManager.TableServers}} public.  
 (I actually think TableServers should be moved out into an outer class, but 
 in the spirit of small patches, I'll refrain from suggesting that in this 
 issue).
 I'll attach a patch for:
 # The removal of {{HConnection.getZooKeeperWrapper()}}
 # Change of {{TableServers}} class from private to public
 # Direct instantiation of {{TableServers}} within {{TestZooKeeper}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215945#comment-13215945
 ] 

stack commented on HBASE-5399:
--

Another thought:

Do we have to have the getSharedZookeeperWatcher and 
releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection 
API?  Are these not implementation details? (Or would it be too hard to undo 
them -- you'd have not way of counting zk and master connections?)

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5399_inprogress.patch, 5399_inprogress.v3.patch, 
 5399_inprogress.v9.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4932) Block cache can be mistakenly instantiated by tools

2012-02-24 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215947#comment-13215947
 ] 

Prakash Khemani commented on HBASE-4932:


Yes ... It is a good to have patch. Thanks.

On 2/24/12 12:03 PM, Mikhail Bautin (Commented) (JIRA) j...@apache.org




 Block cache can be mistakenly instantiated by tools
 ---

 Key: HBASE-4932
 URL: https://issues.apache.org/jira/browse/HBASE-4932
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Fix For: 0.94.0

 Attachments: HBASE-4932.patch


 Map Reduce tasks that create a writer to write HFiles inadvertently end up 
 creating block cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-2310) Review how hbase does addresses throughout including in logs, ui and in code

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2310.
--

Resolution: Later

Resolving as later.  Its a silly general task that just won't get done.

 Review how hbase does addresses throughout including in logs, ui and in code
 

 Key: HBASE-2310
 URL: https://issues.apache.org/jira/browse/HBASE-2310
 Project: HBase
  Issue Type: Task
Reporter: stack

 HBASE-2174 fixed the issue where we were doing dns lookup on each heartbeat 
 and it adds into .META. table hostname rather than IP.  This issue takes over 
 from hbase-2174 to make it so we run through all of hbase making sure we are 
 consistent in our use of hostname rather than IP everywhere.  See HBASE-2174 
 for other background that'll help with this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215973#comment-13215973
 ] 

stack commented on HBASE-5075:
--

This issue seems to be like 'HBASE-2342 Consider adding a watchdog node next to 
region server'

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-2351) publish hadoop + patch artifacts under org.apache.hbase groupId

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2351.
--

Resolution: Won't Fix

This issue no longer applicable now we run against published hadoops w/o need 
of patches.

 publish hadoop + patch artifacts under org.apache.hbase groupId 
 

 Key: HBASE-2351
 URL: https://issues.apache.org/jira/browse/HBASE-2351
 Project: HBase
  Issue Type: Sub-task
  Components: build
Reporter: Karthik K

 Similarly, the trunk of hbase , currently depends on a couple of patches on 
 top of hadoop 0.20.2 release, that is being actively worked on at HBASE-2255 
 .  Once that experience succeeds , before the 0.21.0 release, the artifacts 
 need to be published under groupId - org.apache.hbase and artifactId - 
 hadoop-p1-p2-p3' , say. ( where p1,p2 and p3 are patch numbers, say). 
 The final pom.xml of hbase should be devoid of external references for better 
 maintainability. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215981#comment-13215981
 ] 

Jesse Yates commented on HBASE-5075:


Yeah, very similar. Same issues what that ticket as before, namely wanting to 
keep HBase as simple and minimal as we can justify.

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-2675) Quick smoke tests testsuite

2012-02-24 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2675.
--

Resolution: Fixed

Resolving as fixed by the default run of small tests (reopen if not sufficient 
in your estimation B)

 Quick smoke tests testsuite
 -

 Key: HBASE-2675
 URL: https://issues.apache.org/jira/browse/HBASE-2675
 Project: HBase
  Issue Type: Test
Reporter: Benoit Sigoure
Assignee: nkeywal
Priority: Minor

 It would be nice if there was a known subset of the tests that run fast (e.g. 
 not more than a few seconds) and quickly help us check whether the code isn't 
 horribly broken.  This way one could run those tests at a frequent interval 
 when iterating and only run the entire testsuite at the end, when they think 
 they're done, since doing so is very time consuming.
 Someone would need to identify which tests really focus on the core 
 functionality and add a target in the build system to just run those tests.  
 As a bonus, it would be awesome++ if the core tests ran, say, 10x faster than 
 they currently do.  There's a lot of sleep-based synchronization in the 
 tests and it would be nice to remove some of that where possible to make the 
 tests run as fast as the machine can handle them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5351:
-

Status: Open  (was: Patch Available)

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, 
 HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5351:
-

Status: Patch Available  (was: Open)

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, 
 HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5183) Render the monitored tasks as a treeview

2012-02-24 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215991#comment-13215991
 ] 

Mubarak Seyed commented on HBASE-5183:
--

I believe we need to present the monitored task as TreeView + Table (treeTable).

The first column is a tree with root/node/leaf, 2nd to 5th shows startTime, 
description, state and status.

Something like http://ludo.cubicphuse.nl/jquery-plugins/treeTable/doc/

How do we group data?

Option 1: Group-by StartTime
{code}

Start time |   Description |
State| Status

+ Mon Feb 20 15:10:08 PST 2012   IPC Server handler 99 on 6  
WAITING (since 4mins, 8sec ago)   Waiting for a call (since ..)
 IPC Server handler 20 on 6  
WAITING (since 2mins, 1sec ago)   Waiting for a call (since ..)

+ Mon Feb 22 17:18:18 PST 2012   IPC Server handler 40 on 6  
WAITING (since 0mins, 30sec ago)  Waiting for a call (since ..)


{code}

Option 2: Group-by State

Option 3: Group-by Status

I believe StartTime is almost same for all the IPC server handler as Master or 
RS pages shows the same startTime (depends on when we restart daemons).

Option 2 and 3 are more of a text based grouping. 

Thoughts?

 Render the monitored tasks as a treeview
 

 Key: HBASE-5183
 URL: https://issues.apache.org/jira/browse/HBASE-5183
 Project: HBase
  Issue Type: Sub-task
Reporter: Zhihong Yu
Assignee: Mubarak Seyed
 Fix For: 0.92.2, 0.94.0


 Andy made the suggestion here:
 https://issues.apache.org/jira/browse/HBASE-5174?focusedCommentId=13184571page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13184571

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable

2012-02-24 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5455:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for idea and patch!

 Add test to avoid unintentional reordering of items in HbaseObjectWritable
 --

 Key: HBASE-5455
 URL: https://issues.apache.org/jira/browse/HBASE-5455
 Project: HBase
  Issue Type: Test
Reporter: Michael Drzal
Assignee: Michael Drzal
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5455.diff


 HbaseObjectWritable has a static initialization block that assigns ints to 
 various classes.  The int is assigned by using a local variable that is 
 incremented after each use.  If someone adds a line in the middle of the 
 block, this throws off everything after the change, and can break client 
 compatibility.  There is already a comment to not add/remove lines at the 
 beginning of this block.  It might make sense to have a test against a static 
 set of ids.  If something gets changed unintentionally, it would at least 
 fail the tests.  If the change was intentional, at the very least the test 
 would need to get updated, and it would be a conscious decision.
 https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one 
 issue of this type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5351:
-

Attachment: HBASE-5351-v2.patch

Uploading same patch so can redo submit patch to hadoopqa

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, 
 HBASE-5351-v2.patch, HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5351:
-

Status: Patch Available  (was: Open)

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, 
 HBASE-5351-v2.patch, HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +while (!this.hbAdmin.isTableAvailable(tableName)  
 (ctrTABLE_CREATE_MAX_R
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-02-24 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216009#comment-13216009
 ] 

Kannan Muthukkaruppan commented on HBASE-5416:
--

+1 to what Mikhail said.

Max--- This is an interesting use case. I will take a closer look at the 
changes. But, if it is indeed the case that the set of rows you need to lookup 
in the second CF is a small % of the total data in that CF, then issuing 
subsequent gets (point lookups) for the relevant keys in that CF should work 
reasonably well, correct? BTW, are you doing this using HTableInputFormat? 
Perhaps you can detail the structure of your MR job more, and we can work 
through some specific options.


 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216019#comment-13216019
 ] 

stack commented on HBASE-4991:
--

bq. I feel some of the recent proposals / requirements are far more complex 
than the one

Yeah.  It seemed basic back in December.

bq. There wasn't such requirement when Mubarak outlined his plan

Pardon me.  I should have noticed the plan but did not.  Other priorities.  If 
I'd seen the plan I'd have blanched I think.

bq. Of course, having generic framework for all the master-coordinated tasks 
allows future additions to be concise.

Yep.  We'd have tested, proven primitives to build stuff on rather than have to 
do it per feature

bq. But I think that should have been outlined clearly in the early stage of 
development of a feature.

See above.  Pardon me for missing how involved this addition became.

I don't see how plan of ' 01/Feb/12 07:43' lays foundation for a generic 
framework.   Am I missing something?  It seems like its code for this feature 
only?




 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5350) Fix jamon generated package names

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5350:
-

Attachment: jamon_HBASE-5350.patch

Reattach so can redo hadoopqa

 Fix jamon generated package names
 -

 Key: HBASE-5350
 URL: https://issues.apache.org/jira/browse/HBASE-5350
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.92.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.94.0

 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch


 Previously, jamon was creating the template files in org.apache.hbase, but 
 it should be org.apache.hadoop.hbase, so it's in line with rest of source 
 files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5350) Fix jamon generated package names

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5350:
-

Status: Patch Available  (was: Open)

 Fix jamon generated package names
 -

 Key: HBASE-5350
 URL: https://issues.apache.org/jira/browse/HBASE-5350
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.92.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.94.0

 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch


 Previously, jamon was creating the template files in org.apache.hbase, but 
 it should be org.apache.hadoop.hbase, so it's in line with rest of source 
 files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5350) Fix jamon generated package names

2012-02-24 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5350:
-

Status: Open  (was: Patch Available)

 Fix jamon generated package names
 -

 Key: HBASE-5350
 URL: https://issues.apache.org/jira/browse/HBASE-5350
 Project: HBase
  Issue Type: Bug
  Components: monitoring
Affects Versions: 0.92.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.94.0

 Attachments: jamon_HBASE-5350.patch, jamon_HBASE-5350.patch


 Previously, jamon was creating the template files in org.apache.hbase, but 
 it should be org.apache.hadoop.hbase, so it's in line with rest of source 
 files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread dhruba borthakur (Created) (JIRA)
Metrics does not push pread time


 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Priority: Minor


The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread dhruba borthakur (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-5473:


Assignee: dhruba borthakur
  Status: Patch Available  (was: Open)

 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor

 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216044#comment-13216044
 ] 

Phabricator commented on HBASE-5442:


mbautin has commented on the revision [jira] [HBASE-5442] [89-fb] Use builder 
pattern in StoreFile and HFile.

  This passed all unit tests.

REVISION DETAIL
  https://reviews.facebook.net/D1941


 Use builder pattern in StoreFile and HFile
 --

 Key: HBASE-5442
 URL: https://issues.apache.org/jira/browse/HBASE-5442
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.94.0

 Attachments: D1893.1.patch, D1893.2.patch, D1941.1.patch, 
 HFile-StoreFile-builder-2012-02-22_22_49_00.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses StoreFile and HFile refactoring. For 
 HColumnDescriptor refactoring see HBASE-5357.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5473:
---

Attachment: D1947.1.patch

dhruba requested code review of [jira] [HBASE-5473] Metrics does not push 
pread time.
Reviewers: sc, tedyu

  Metrics does not push pread time.

TEST PLAN
  All unit tests pass. Also deployed on my local test cluster.

REVISION DETAIL
  https://reviews.facebook.net/D1947

AFFECTED FILES
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/4119/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5473:
---

Attachment: D1947.1.patch

dhruba requested code review of [jira] [HBASE-5473] Metrics does not push 
pread time.
Reviewers: sc, tedyu

  Metrics does not push pread time.

TEST PLAN
  All unit tests pass. Also deployed on my local test cluster.

REVISION DETAIL
  https://reviews.facebook.net/D1947

AFFECTED FILES
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/4119/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5473:
---

Attachment: D1947.1.patch

dhruba requested code review of [jira] [HBASE-5473] Metrics does not push 
pread time.
Reviewers: sc, tedyu

  Metrics does not push pread time.

TEST PLAN
  All unit tests pass. Also deployed on my local test cluster.

REVISION DETAIL
  https://reviews.facebook.net/D1947

AFFECTED FILES
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/4119/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216053#comment-13216053
 ] 

Phabricator commented on HBASE-5357:


Kannan has commented on the revision [jira] [HBASE-5357] [89-fb] Refactoring: 
use the builder pattern for HColumnDescriptor.

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:486 
INTEGER.MAX_VALUE seems to be the block size. Not sure why it was this in the 
past. But the new behavior, you are defaulting back to the default (64k).
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:544 ditto.

REVISION DETAIL
  https://reviews.facebook.net/D1929


 Use builder pattern in HColumnDescriptor
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, 
 D1851.4.patch, D1929.1.patch, D1929.2.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, 
 Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses the HColumnDescriptor refactoring. For 
 StoreFile/HFile refactoring see HBASE-5442.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5474) Shared the multiput thread pool for all the HTable instance

2012-02-24 Thread Liyin Tang (Created) (JIRA)
Shared the multiput thread pool for all the HTable instance
---

 Key: HBASE-5474
 URL: https://issues.apache.org/jira/browse/HBASE-5474
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang


Currently, each HTable instance will have a thread pool for the multiput 
operation. Each thread pool is actually a unbounded cached thread pool. So it 
would increase the efficiency if HTable could share this unbounded cached 
thread pool across all the HTable instance ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216059#comment-13216059
 ] 

Phabricator commented on HBASE-5357:


mbautin has commented on the revision [jira] [HBASE-5357] [89-fb] Refactoring: 
use the builder pattern for HColumnDescriptor.

INLINE COMMENTS
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:486 Yes, that 
block size looked like a bug to me, so I left it out. If someone explains to me 
why that was reasonable, I would be happy to add it back.

REVISION DETAIL
  https://reviews.facebook.net/D1929


 Use builder pattern in HColumnDescriptor
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, 
 D1851.4.patch, D1929.1.patch, D1929.2.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, 
 Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses the HColumnDescriptor refactoring. For 
 StoreFile/HFile refactoring see HBASE-5442.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5442:
---

Attachment: D1941.2.patch

mbautin updated the revision [jira] [HBASE-5442] [89-fb] Use builder pattern 
in StoreFile and HFile.
Reviewers: JIRA, khemani, Kannan, Liyin, Karthik, nspiegelberg

  Removing irrelevant javadoc from StoreFile writer constructor

REVISION DETAIL
  https://reviews.facebook.net/D1941

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/util/CompressionTest.java
  src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFilePerformance.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java
  src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


 Use builder pattern in StoreFile and HFile
 --

 Key: HBASE-5442
 URL: https://issues.apache.org/jira/browse/HBASE-5442
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.94.0

 Attachments: D1893.1.patch, D1893.2.patch, D1941.1.patch, 
 D1941.2.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses StoreFile and HFile refactoring. For 
 HColumnDescriptor refactoring see HBASE-5357.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216066#comment-13216066
 ] 

stack commented on HBASE-5075:
--

Rather than write a new supervisor, why not use something old school like 
http://supervisord.org/  A wrapper script could clear old znode from zk before 
restarting new RS instance?

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition

2012-02-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216065#comment-13216065
 ] 

Hadoop QA commented on HBASE-5441:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12515639/HBASE-5441.D1857.4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1047//console

This message is automatically generated.

 HRegionThriftServer may not start because of a race-condition
 -

 Key: HBASE-5441
 URL: https://issues.apache.org/jira/browse/HBASE-5441
 Project: HBase
  Issue Type: Bug
  Components: thrift
Reporter: Scott Chen
Assignee: Scott Chen
Priority: Minor
 Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, 
 HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch, HBASE-5441.D1857.4.patch


 This happens because the master is not started when ThriftServerRunner tries 
 to create an HBaseAdmin.
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not 
 running yet
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
 at $Proxy8.getProtocolVersion(Unknown Source)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649)
 at 
 org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658)
 at java.lang.Thread.run(Thread.java:662)
 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216069#comment-13216069
 ] 

Hudson commented on HBASE-5325:
---

Integrated in HBase-0.92 #303 (See 
[https://builds.apache.org/job/HBase-0.92/303/])
HBASE-5325 Expose basic information about the master-status through jmx 
beans (Revision 1293417)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MXBean.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/MXBean.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/MXBeanImpl.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMXBean.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMXBean.java


 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.92.1, 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.2.patch, 
 HBASE-5325.3.branch-0.92.patch, HBASE-5325.3.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5474) Shared the multiput thread pool for all the HTable instance

2012-02-24 Thread Liyin Tang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5474:
--

Description: 
Currently, each HTable instance will have a thread pool for the multiput 
operation. Each thread pool is actually a cached thread pool, which is bounded 
the number of region server. So the maximum number of threads will be ( # 
region server * # htable instance).  On the other hand, if all HTable instance 
could share this thread pool, the max number threads will still be the same. 
However, it will increase the thread pool efficiency.


  was:Currently, each HTable instance will have a thread pool for the multiput 
operation. Each thread pool is actually a unbounded cached thread pool. So it 
would increase the efficiency if HTable could share this unbounded cached 
thread pool across all the HTable instance ?


 Shared the multiput thread pool for all the HTable instance
 ---

 Key: HBASE-5474
 URL: https://issues.apache.org/jira/browse/HBASE-5474
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 Currently, each HTable instance will have a thread pool for the multiput 
 operation. Each thread pool is actually a cached thread pool, which is 
 bounded the number of region server. So the maximum number of threads will be 
 ( # region server * # htable instance).  On the other hand, if all HTable 
 instance could share this thread pool, the max number threads will still be 
 the same. However, it will increase the thread pool efficiency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5075) regionserver crashed and failover

2012-02-24 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216078#comment-13216078
 ] 

stack commented on HBASE-5075:
--

Looking in HRegionServer code, it looks like we delete our znode on the way out 
already.  Someone had your idea already Jesse:

{code}
try {
  deleteMyEphemeralNode();
} catch (KeeperException e) {
  LOG.warn(Failed deleting my ephemeral node, e);
}
{code}

Maybe this is broke?

 regionserver crashed and failover
 -

 Key: HBASE-5075
 URL: https://issues.apache.org/jira/browse/HBASE-5075
 Project: HBase
  Issue Type: Improvement
  Components: monitoring, regionserver, replication, zookeeper
Affects Versions: 0.92.1
Reporter: zhiyuan.dai
 Fix For: 0.90.5

 Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, 
 HBase-5075-src.patch


 regionserver crashed,it is too long time to notify hmaster.when hmaster know 
 regionserver's shutdown,it is long time to fetch the hlog's lease.
 hbase is a online db, availability is very important.
 i have a idea to improve availability, monitor node to check regionserver's 
 pid.if this pid not exsits,i think the rs down,i will delete the znode,and 
 force close the hlog file.
 so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216077#comment-13216077
 ] 

Phabricator commented on HBASE-5473:


sc has commented on the revision [jira] [HBASE-5473] Metrics does not push 
pread time.

  looks good to me

REVISION DETAIL
  https://reviews.facebook.net/D1947


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216075#comment-13216075
 ] 

Phabricator commented on HBASE-5473:


sc has commented on the revision [jira] [HBASE-5473] Metrics does not push 
pread time.

  looks good to me

REVISION DETAIL
  https://reviews.facebook.net/D1947


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216076#comment-13216076
 ] 

Phabricator commented on HBASE-5473:


sc has commented on the revision [jira] [HBASE-5473] Metrics does not push 
pread time.

  looks good to me

REVISION DETAIL
  https://reviews.facebook.net/D1947


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216082#comment-13216082
 ] 

Phabricator commented on HBASE-5357:


stack has commented on the revision [jira] [HBASE-5357] [89-fb] Refactoring: 
use the builder pattern for HColumnDescriptor.

  Sounds like the old stuff was wrong.

REVISION DETAIL
  https://reviews.facebook.net/D1929


 Use builder pattern in HColumnDescriptor
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, 
 D1851.4.patch, D1929.1.patch, D1929.2.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, 
 Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses the HColumnDescriptor refactoring. For 
 StoreFile/HFile refactoring see HBASE-5442.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216084#comment-13216084
 ] 

Phabricator commented on HBASE-5473:


stack has accepted the revision [jira] [HBASE-5473] Metrics does not push 
pread time.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D1947

BRANCH
  svn


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216086#comment-13216086
 ] 

Phabricator commented on HBASE-5473:


stack has accepted the revision [jira] [HBASE-5473] Metrics does not push 
pread time.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D1947

BRANCH
  svn


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216085#comment-13216085
 ] 

Phabricator commented on HBASE-5473:


stack has accepted the revision [jira] [HBASE-5473] Metrics does not push 
pread time.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D1947

BRANCH
  svn


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5473) Metrics does not push pread time

2012-02-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216087#comment-13216087
 ] 

Phabricator commented on HBASE-5473:


stack has accepted the revision [jira] [HBASE-5473] Metrics does not push 
pread time.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D1947

BRANCH
  svn


 Metrics does not push pread time
 

 Key: HBASE-5473
 URL: https://issues.apache.org/jira/browse/HBASE-5473
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Priority: Minor
 Attachments: D1947.1.patch, D1947.1.patch, D1947.1.patch


 The RegionServerMetrics is not pushing the pread times to the MetricsRecord

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-24 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216089#comment-13216089
 ] 

Lars Hofhansl commented on HBASE-4991:
--

Maybe we should separate this feature from a generic framework?

For this issue we could just have one API: deleteRange(table, startKey, 
endKey). Initially it could validate that the start and endKey coincide with 
exactly one region, that way we can extend this later, without having regions 
exposed in the API.
(still need to avoid races with splitting and balancing of course - makes it 
almost nicer to go back to the original approach of passing a region name).

Just my $0.02.


 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5351) hbase completebulkload to a new table fails in a race

2012-02-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216093#comment-13216093
 ] 

Hadoop QA commented on HBASE-5351:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515977/HBASE-5351-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -131 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 155 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1046//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1046//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1046//console

This message is automatically generated.

 hbase completebulkload to a new table fails in a race
 -

 Key: HBASE-5351
 URL: https://issues.apache.org/jira/browse/HBASE-5351
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.92.0, 0.94.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5351-v1.patch, HBASE-5351-v2.patch, 
 HBASE-5351-v2.patch, HBASE-5351.patch


 I have a test that tests vanilla use of importtsv with importtsv.bulk.output 
 option followed by completebulkload to a new table.
 This sometimes fails as follows:
 11/12/19 15:02:39 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table:
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: ml_items_copy, row=ml_items_copy,,99
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
 at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
 at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:875)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:929)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:817)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:781)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:247)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:211)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.createTable(LoadIncrementalHFiles.java:673)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:697)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
 at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:707)
 The race appears to be calling HbAdmin.createTableAsync(htd, keys) and then 
 creating an HTable object before that call has actually completed.
 The following change to 
 /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
 appears to fix the problem, but I have not been able to reproduce the race 
 reliably, in order to write a test.
 {code}
 -HTable table = new HTable(this.cfg, tableName);
 -
 -HConnection conn = table.getConnection();
  int ctr = 0;
 -while (!conn.isTableAvailable(table.getTableName())  
 (ctrTABLE_CREATE_MA
 +

[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-24 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216095#comment-13216095
 ] 

Jimmy Xiang commented on HBASE-3909:


Can we put dynamic configuration somewhere in the HDFS, for example, some file 
under hbase.rootdir?

We can put static configuration in hbase-site.xml, and dynamic configuration in 
a file under hbase.rootdir.

We can also enhance hbase shell or master UI to view/change those dynamic 
configurations.


 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >