[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.3.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Many new goodies, thanks to the feedback from Mikhail and Todd. This completes my addressing all the current set of review comments. If somebody can re-review it again, that will be great. 1. The bytesPerChecksum is configurable. One can set hbase.hstore.bytes.per.checksum in the config to set this. The default value is 16K. Similarly, one can set hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If PureJavaCRC32 algoritm is available in the classpath, then it is used, otherwise it falls back to using java.util.zip.CRC32. Each checksum value is assumed to be 4 bytes, it is currently not configurable (any comments here?). The reflection-method of creating checksum objects is reworked to incur much lower overhead. 2. If a hbase-level crc check fails, then it falls back to using hdfs-level checksums for the next few reads (defalts to 100). After that, it will retry using hbase-level checksums. I picked 100 as the default so that even in the case of continuous hbase-checksum failures, the overhead for additionals iops is limited to 1%. Enahnced unit test to validate this behaviour. 3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, added JMX metrics to record the number of times hbase-checksum verification failures occur. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.3.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Many new goodies, thanks to the feedback from Mikhail and Todd. This completes my addressing all the current set of review comments. If somebody can re-review it again, that will be great. 1. The bytesPerChecksum is configurable. One can set hbase.hstore.bytes.per.checksum in the config to set this. The default value is 16K. Similarly, one can set hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If PureJavaCRC32 algoritm is available in the classpath, then it is used, otherwise it falls back to using java.util.zip.CRC32. Each checksum value is assumed to be 4 bytes, it is currently not configurable (any comments here?). The reflection-method of creating checksum objects is reworked to incur much lower overhead. 2. If a hbase-level crc check fails, then it falls back to using hdfs-level checksums for the next few reads (defalts to 100). After that, it will retry using hbase-level checksums. I picked 100 as the default so that even in the case of continuous hbase-checksum failures, the overhead for additionals iops is limited to 1%. Enahnced unit test to validate this behaviour. 3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, added JMX metrics to record the number of times hbase-checksum verification failures occur. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5074: -- Status: Patch Available (was: Open) support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201200#comment-13201200 ] Hadoop QA commented on HBASE-5074: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513416/D1521.3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 76 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -133 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.util.TestMergeTool org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit org.apache.hadoop.hbase.io.hfile.TestHFileBlock org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/907//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/907//console This message is automatically generated. support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201205#comment-13201205 ] Anoop Sam John commented on HBASE-2038: --- Hi Lars, I am also trying for a secondary index and I have seen the IHBase concept being good.. But we need this to be moved to coprocessor based so that the kernel code of HBase need not be different for the secondary index. IHBase makes the scan go through all the regions ( as u said ) but they will skip and seek to the later positions in the heap avoid so many possible data read from HDFS etc... When I saw the current co processor, we call preScannerNext() from HRegionServer next(final long scannerId, int nbRows) and pass the RegionScanner here to the co processor. But as per the IHBase way, within the co processor we should be able to seek to the correct row where the indexed col val equals our value. But we can not do this as of now as RegionScanner seek() not there. Also this preScannerNext() will be called once before the actual next(final long scannerId, int nbRows) call happening on the region. Here as per the cache value at the client side the nbRows might be more than one. Now suppose this is nbRows=2 and in the region we have 2 rows one at some what in the middle part of an HFile and the other at another HFile. Now as per IHBase we should 1st seek to the 1st position of the row and after reading this data should seek to the next position. Now as per the current way of calling of preScannerNext() this wont be possible. So I think we might need some change in these area? What do u say? Mean while what is your plan to continue with the way of IHBase storing the index in memory for each of the region or some change in this? Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5338) Add SKIP support to importtsv
Add SKIP support to importtsv -- Key: HBASE-5338 URL: https://issues.apache.org/jira/browse/HBASE-5338 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Lars George Priority: Trivial It'd be nice to have support for SKIP mappings so that you can omit columns from the TSV during the import. For example {code} -Dimporttsv.columns=SKIP,HBASE_ROW_KEY,cf1:col1,cf1:col2,SKIP,SKIP,cf2:col1... {code} Or maybe HBASE_SKIP_COLUMN to be less ambiguous. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5339) Add support for compound keys to importtsv
Add support for compound keys to importtsv -- Key: HBASE-5339 URL: https://issues.apache.org/jira/browse/HBASE-5339 Project: HBase Issue Type: Improvement Reporter: Lars George -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5339) Add support for compound keys to importtsv
[ https://issues.apache.org/jira/browse/HBASE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars George updated HBASE-5339: --- Component/s: mapreduce Description: Add support that you can combine some columns from the TSV with either a given separator, no separator, or with a custom row key generator class. Syntax could be: {code} -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3 -Dimporttsv.rowkey.separator=- {code} Another option of course is using the custom mapper class and handle this there, but this also seems like a nice to have option, probably often covering the 80% this sort of thing is needed. Priority: Trivial (was: Major) Add support for compound keys to importtsv -- Key: HBASE-5339 URL: https://issues.apache.org/jira/browse/HBASE-5339 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Lars George Priority: Trivial Add support that you can combine some columns from the TSV with either a given separator, no separator, or with a custom row key generator class. Syntax could be: {code} -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3 -Dimporttsv.rowkey.separator=- {code} Another option of course is using the custom mapper class and handle this there, but this also seems like a nice to have option, probably often covering the 80% this sort of thing is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5339) Add support for compound keys to importtsv
[ https://issues.apache.org/jira/browse/HBASE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201255#comment-13201255 ] Lars George commented on HBASE-5339: Obviously, you can rearrange the compound key parts by using different HBASE_ROW_KEY_N, where N is user knowledge. {code} -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_3,cf1:col1,cf2:col3,HBASE_ROW_KEY_2 {code} Add support for compound keys to importtsv -- Key: HBASE-5339 URL: https://issues.apache.org/jira/browse/HBASE-5339 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Lars George Priority: Trivial Add support that you can combine some columns from the TSV with either a given separator, no separator, or with a custom row key generator class. Syntax could be: {code} -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3 -Dimporttsv.rowkey.separator=- {code} Another option of course is using the custom mapper class and handle this there, but this also seems like a nice to have option, probably often covering the 80% this sort of thing is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations
[ https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201351#comment-13201351 ] Nicolas Spiegelberg commented on HBASE-5335: @Lars: the original idea was to allow users to arbitrarily set KVs in the HTableDescriptor and HColumnDescriptor, but make it so users know that what they're doing is not checked. Need some sort of format to distinguish between reserved keywords and non-reserved (thinking of doing this on the client side). As a config value becomes more well-known, we can enforce limitations like you stated. I'd rather have this evolve by having a handful of users who want to set a config value, learn over the long-term that this is useful, and incrementally refactor the code to ease support for that config. I don't want to get into a spot where we have to do a large refactor to support this feature do extensive sanity checking, only to determine that we only need 20% of the config values. Dynamic Schema Configurations - Key: HBASE-5335 URL: https://issues.apache.org/jira/browse/HBASE-5335 Project: HBase Issue Type: New Feature Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Labels: configuration, schema Currently, the ability for a core developer to add per-table per-CF configuration settings is very heavyweight. You need to add a reserved keyword all the way up the stack you have to support this variable long-term if you're going to expose it explicitly to the user. This has ended up with using Configuration.get() a lot because it is lightweight and you can tweak settings while you're trying to understand system behavior [since there are many config params that may never need to be tuned]. We need to add the ability to put read arbitrary KV settings in the HBase schema. Combined with online schema change, this will allow us to safely iterate on configuration settings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201354#comment-13201354 ] jirapos...@reviews.apache.org commented on HBASE-5229: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4833 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/3748/#comment10616 Do we need to be sorting rowsToLock? I'm thinking of multiple concurrent mutateRows operation, trying to lock the same set of rows. Perhaps, throwing IOException is going to prevent us from a situation where we end up with a deadlock. But, we still might want to sort it to ensure (better) progress (no livelock). - Amitanand On 2012-02-03 19:59:55, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3748/ bq. --- bq. bq. (Updated 2012-02-03 19:59:55) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This builds on HBASE-3584, HBASE-5203, and HBASE-5304. bq. bq. Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). bq. At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. bq. bq. Obviously this is an advanced features and this prominently called out in the Javadoc. bq. bq. bq. This addresses bug HBASE-5229. bq. https://issues.apache.org/jira/browse/HBASE-5229 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953 bq. bq. Diff: https://reviews.apache.org/r/3748/diff bq. bq. bq. Testing bq. --- bq. bq. Tests added to TestFromClientSide and TestAtomicOperation bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201365#comment-13201365 ] jirapos...@reviews.apache.org commented on HBASE-5229: -- bq. On 2012-02-06 15:52:43, Amitanand Aiyer wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4212 bq. https://reviews.apache.org/r/3748/diff/2/?file=72266#file72266line4212 bq. bq. Do we need to be sorting rowsToLock? bq. bq. I'm thinking of multiple concurrent mutateRows operation, trying to lock the same set of rows. bq. bq. Perhaps, throwing IOException is going to prevent us from a situation where we end up with a deadlock. But, we still might want to sort it to ensure (better) progress (no livelock). MutateRows sorts them (by using a TreeSet with Bytes.BYTES_COMPARATOR, for exactly this reason. Maybe this should be called out here, by making the argument a SortedSet. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4833 --- On 2012-02-03 19:59:55, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3748/ bq. --- bq. bq. (Updated 2012-02-03 19:59:55) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This builds on HBASE-3584, HBASE-5203, and HBASE-5304. bq. bq. Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). bq. At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. bq. bq. Obviously this is an advanced features and this prominently called out in the Javadoc. bq. bq. bq. This addresses bug HBASE-5229. bq. https://issues.apache.org/jira/browse/HBASE-5229 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1239953 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1239953 bq. bq. Diff: https://reviews.apache.org/r/3748/diff bq. bq. bq. Testing bq. --- bq. bq. Tests added to TestFromClientSide and TestAtomicOperation bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201400#comment-13201400 ] Phabricator commented on HBASE-5074: tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is not safe. See https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/: Caused by: java.lang.ClassCastException: org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to org.apache.hadoop.hbase.util.HFileSystem at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326) src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we default to CRC32C ? src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is needed. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we name this variable ctor ? Similar comment applies to other meth variables in this patch. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201401#comment-13201401 ] Phabricator commented on HBASE-5074: tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is not safe. See https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/: Caused by: java.lang.ClassCastException: org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to org.apache.hadoop.hbase.util.HFileSystem at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326) src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we default to CRC32C ? src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is needed. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we name this variable ctor ? Similar comment applies to other meth variables in this patch. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5074: -- Comment: was deleted (was: tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is not safe. See https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/: Caused by: java.lang.ClassCastException: org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to org.apache.hadoop.hbase.util.HFileSystem at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326) src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we default to CRC32C ? src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is needed. src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we name this variable ctor ? Similar comment applies to other meth variables in this patch. REVISION DETAIL https://reviews.facebook.net/D1521 ) support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default
[ https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201408#comment-13201408 ] Zhihong Yu commented on HBASE-5267: --- @J-D: Do you want to take a look at patch v3 ? Add a configuration to disable the slab cache by default Key: HBASE-5267 URL: https://issues.apache.org/jira/browse/HBASE-5267 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Li Pi Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt From what I commented at the tail of HBASE-4027: {quote} I changed the release note, the patch doesn't have a hbase.offheapcachesize configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize (which is actually a big problem when you consider this: http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). {quote} We need to add hbase.offheapcachesize and set it to false by default. Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201431#comment-13201431 ] Zhihong Yu commented on HBASE-5317: --- I don't see ClassNotFoundException in test output but the following may provide a clue: {code} 2012-02-06 09:44:48,377 WARN [main] mapreduce.JobSubmitter(139): Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2012-02-06 09:44:48,380 WARN [main] mapreduce.JobSubmitter(241): No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2012-02-06 09:44:51,163 WARN [ContainersLauncher #0] nodemanager.DefaultContainerExecutor(192): Exit code from task is : 127 2012-02-06 09:44:51,165 WARN [ContainersLauncher #0] launcher.ContainerLaunch(273): Container exited with a non-zero exit code 127 {code} Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jai Kumar Singh updated HBASE-5166: --- Status: Patch Available (was: Open) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201433#comment-13201433 ] Jai Kumar Singh commented on HBASE-5166: Any comments ?? MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201439#comment-13201439 ] Zhihong Yu commented on HBASE-5166: --- MultithreadedTableMapper misses Apache license {code} +while(!executor.isTerminated()){ + // wait till all the threads are done +} {code} We should put sleep() in the above loop and possibly limit the total duration of wait. A new unit test should be added for MultithreadedTableMapper. Please look at tests that use TableMapper. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201444#comment-13201444 ] Zhihong Yu commented on HBASE-5317: --- For HBASE-5317-v1.patch, I think we shouldn't simply catch TableExistsException. We should add missing Configuration parameters in MRv2 so that there is no TableExistsException during test. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201460#comment-13201460 ] Zhihong Yu commented on HBASE-5317: --- Configuration.handleDeprecation() is private. We may need to borrow deprecatedKeyMap and come up with good strategy of providing up-to-date config parameters to MRv2. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201460#comment-13201460 ] Zhihong Yu edited comment on HBASE-5317 at 2/6/12 6:47 PM: --- Configuration.handleDeprecation() is private. We may need to borrow deprecatedKeyMap and come up with good strategy of providing up-to-date config parameters to MRv2 (when hadoop.profile property carries value of 23). was (Author: zhi...@ebaysf.com): Configuration.handleDeprecation() is private. We may need to borrow deprecatedKeyMap and come up with good strategy of providing up-to-date config parameters to MRv2. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201473#comment-13201473 ] Gregory Chanan commented on HBASE-5317: --- @Ted: I agree we shouldn't just catch the TableExistsException -- that's why I didn't initially post that part of the patch. My recollection of that issue was that the MiniMRCluster is creating a target directory in hbase.rootdir [I could be wrong about the exact location]. When we call table.getTableDescriptor() it can't get a table descriptor for target, so throws a TableExistsException. Can the handleDeprecation call prevent the target directory from being created? Or are you thinking of something else? It also seems a little strange to me that calling table.getTableDescriptor(); tries to get table descriptors for *everything* in hbase.rootdir. Why should I get an exception thrown if hbase can't find a TableDescriptor for target when I am only asking about table? Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5340) HFile/LoadIncrementalHFiles should specify file name when it fails to load a file.
HFile/LoadIncrementalHFiles should specify file name when it fails to load a file. -- Key: HBASE-5340 URL: https://issues.apache.org/jira/browse/HBASE-5340 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.90.5 Reporter: Jonathan Hsieh I was attempting to do a bulk load and got this error message. Unfortunately it didn't tell me which file had the problem. {code} Exception in thread main java.io.IOException: Trailer 'header' is wrong; does the trailer size match content? at org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryLoad(LoadIncrementalHFiles.java:204) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:173) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:452) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:457) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201488#comment-13201488 ] Zhihong Yu commented on HBASE-5317: --- handleDeprecation() call may not prevent the target directory from being created. We should try to find out why 'target' directory was only created for hadoop 0.23. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201491#comment-13201491 ] Zhihong Yu commented on HBASE-5317: --- Not every directory under hbase.rootdir is deemed corresponding to a table. See the following javadoc in FSUtils.getTableDirs(): {code} * @return All the table directories under coderootdir/code. Ignore non table hbase folders such as * .logs, .oldlogs, .corrupt, .META., and -ROOT- folders. {code} Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201515#comment-13201515 ] jirapos...@reviews.apache.org commented on HBASE-5229: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/ --- (Updated 2012-02-06 19:51:58.341235) Review request for hbase. Changes --- Addressed a few comments. In addition the client can retry a MultiRowMutation if the first row is not available in the region (as that might indicate that the region moved). Summary --- This builds on HBASE-3584, HBASE-5203, and HBASE-5304. Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. Obviously this is an advanced features and this prominently called out in the Javadoc. This addresses bug HBASE-5229. https://issues.apache.org/jira/browse/HBASE-5229 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1241120 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1241120 Diff: https://reviews.apache.org/r/3748/diff Testing --- Tests added to TestFromClientSide and TestAtomicOperation Thanks, Lars Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201529#comment-13201529 ] Phabricator commented on HBASE-5074: todd has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. I haven't thought about it quite enough, but is there any way to do this without leaking the HFileSystem out to the rest of the code? As Ted pointed out, there are some somewhat public interfaces that will probably get touched by that, and the number of places it has required changes in unrelated test cases seems like a code smell to me. Maybe this could be a static cache somewhere, that given a FileSystem instance, it maintains the un-checksumed equivalents thereof as weak references? Then the concept would be self-contained within the HFile code, which up til now has been a fairly standalone file format. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201527#comment-13201527 ] Phabricator commented on HBASE-5074: todd has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. I haven't thought about it quite enough, but is there any way to do this without leaking the HFileSystem out to the rest of the code? As Ted pointed out, there are some somewhat public interfaces that will probably get touched by that, and the number of places it has required changes in unrelated test cases seems like a code smell to me. Maybe this could be a static cache somewhere, that given a FileSystem instance, it maintains the un-checksumed equivalents thereof as weak references? Then the concept would be self-contained within the HFile code, which up til now has been a fairly standalone file format. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5336) Spurious exceptions in HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201536#comment-13201536 ] Lars Hofhansl commented on HBASE-5336: -- Interestingly I find no matching logs on the RegionServers or Datanodes. I feel like I have seen a jira about this before, but I cannot find it. Spurious exceptions in HConnectionImplementation Key: HBASE-5336 URL: https://issues.apache.org/jira/browse/HBASE-5336 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl I have seen this on the client a few time during heave write testing: java.util.concurrent.ExecutionException: java.io.IOException: java.io.IOException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1376) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:891) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:743) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:730) at NewsFeedCreate.insert(NewsFeedCreate.java:91) at NewsFeedCreate$1.run(NewsFeedCreate.java:38) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: java.io.IOException: java.lang.NullPointerException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:212) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1348) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:243) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1289) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1386) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2161) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1954) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3363) at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1353) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1351) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) ... 7 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201556#comment-13201556 ] jirapos...@reviews.apache.org commented on HBASE-5229: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4844 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/3748/#comment10643 What if rm contains more than one Mutation ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/3748/#comment10642 else is not needed considering exception is thrown on line 4170. - Ted On 2012-02-06 19:51:58, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3748/ bq. --- bq. bq. (Updated 2012-02-06 19:51:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This builds on HBASE-3584, HBASE-5203, and HBASE-5304. bq. bq. Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). bq. At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. bq. bq. Obviously this is an advanced features and this prominently called out in the Javadoc. bq. bq. bq. This addresses bug HBASE-5229. bq. https://issues.apache.org/jira/browse/HBASE-5229 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1241120 bq. bq. Diff: https://reviews.apache.org/r/3748/diff bq. bq. bq. Testing bq. --- bq. bq. Tests added to TestFromClientSide and TestAtomicOperation bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201563#comment-13201563 ] Zhihong Yu commented on HBASE-5229: --- For my first comment, RowMutation maintains single row in internalAdd(). So it should be fine passing the row directly to internalMutate(). Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201569#comment-13201569 ] Phabricator commented on HBASE-5074: mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. @dhruba; thanks for the fixes! Here are some more comments (I still have to go through the last 25% of the new version of the patch). INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 Please address this comment. The javadoc says major and the variable name says minor. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 Please correct the misspelling. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I think this function needs to be renamed to expectAtLeastMajorVersion for clarity src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we should either consistently use the onDiskSizeWithHeader field or get rid of it. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please do use a constant instead of 0 here for the minor version. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy initialization is not thread-safe. This also applies to other enum members below. Can the meth field be initialized on the enum constructor, or do we rely on some classes being loaded by the time this initialization is invoked? src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid repeating org.apache.hadoop.util.PureJavaCrc32 three times in string form src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid repeating the java.util.zip.CRC32 string src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid repeating the string src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 Inconsistent formatting: 1024 +980. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201570#comment-13201570 ] Phabricator commented on HBASE-5074: mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. @dhruba; thanks for the fixes! Here are some more comments (I still have to go through the last 25% of the new version of the patch). INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 Please address this comment. The javadoc says major and the variable name says minor. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 Please correct the misspelling. src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I think this function needs to be renamed to expectAtLeastMajorVersion for clarity src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we should either consistently use the onDiskSizeWithHeader field or get rid of it. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please do use a constant instead of 0 here for the minor version. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy initialization is not thread-safe. This also applies to other enum members below. Can the meth field be initialized on the enum constructor, or do we rely on some classes being loaded by the time this initialization is invoked? src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid repeating org.apache.hadoop.util.PureJavaCrc32 three times in string form src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid repeating the java.util.zip.CRC32 string src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid repeating the string src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix indentation src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 Inconsistent formatting: 1024 +980. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201614#comment-13201614 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Todd: I agree with you. It is messy that the HFileSystem interface is leaking out to the unit tests. Instead, inside HFile, I can do something like this when a Reader is created: if (!fs instanceof HFileSystem) { fs = new HFileSystem(fs); } what this means is that users of HFile that already passes in a HFileSystem will get the new behaviour while. HReginServer anyways voluntarily creates HFileSystem before invoking HFile, so it work. I did not do this earlier because I thought that 'using reflection' is costly, but on second thoughts the cost is not much because it will be done only once when a new reader is created for the first time. what do you think? REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201615#comment-13201615 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Todd: I agree with you. It is messy that the HFileSystem interface is leaking out to the unit tests. Instead, inside HFile, I can do something like this when a Reader is created: if (!fs instanceof HFileSystem) { fs = new HFileSystem(fs); } what this means is that users of HFile that already passes in a HFileSystem will get the new behaviour while. HReginServer anyways voluntarily creates HFileSystem before invoking HFile, so it work. I did not do this earlier because I thought that 'using reflection' is costly, but on second thoughts the cost is not much because it will be done only once when a new reader is created for the first time. what do you think? REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201619#comment-13201619 ] Phabricator commented on HBASE-5074: todd has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Yea, I think the instanceof check and confining HFileSystem to be only within the hfile package is much better. I don't think it should be costly -- as you said, it's only when the reader is created, which isn't on the hot code path, and instanceof checks are actually quite fast. They turn into a simple compare of the instance's klassid header against a constant, if I remember correctly. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201618#comment-13201618 ] Phabricator commented on HBASE-5074: todd has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Yea, I think the instanceof check and confining HFileSystem to be only within the hfile package is much better. I don't think it should be costly -- as you said, it's only when the reader is created, which isn't on the hot code path, and instanceof checks are actually quite fast. They turn into a simple compare of the instance's klassid header against a constant, if I remember correctly. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201624#comment-13201624 ] jirapos...@reviews.apache.org commented on HBASE-5229: -- bq. On 2012-02-06 20:24:17, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4152 bq. https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152 bq. bq. What if rm contains more than one Mutation ? Hopefully rm does contain more than one Mutation, otherwise using this API is pointless. :) It is guranteed, though, that all Mutations are for this single row. Do you see a concern? bq. On 2012-02-06 20:24:17, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4171 bq. https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171 bq. bq. else is not needed considering exception is thrown on line 4170. Right. But this makes the flow clear. Personally I am not a big fan of having to look through code and having to piece together the control flow by tracking exceptions and return statements. I don't mind changing it, though. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4844 --- On 2012-02-06 19:51:58, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3748/ bq. --- bq. bq. (Updated 2012-02-06 19:51:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This builds on HBASE-3584, HBASE-5203, and HBASE-5304. bq. bq. Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). bq. At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. bq. bq. Obviously this is an advanced features and this prominently called out in the Javadoc. bq. bq. bq. This addresses bug HBASE-5229. bq. https://issues.apache.org/jira/browse/HBASE-5229 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1241120 bq. bq. Diff: https://reviews.apache.org/r/3748/diff bq. bq. bq. Testing bq. --- bq. bq. Tests added to TestFromClientSide and TestAtomicOperation bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here.
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201679#comment-13201679 ] Phabricator commented on HBASE-5074: mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Some more comments. I am still concerned about the copy-paste stuff in backwards-compatibility checking. Is there a way to minimize that? I also mentioned this in the comments below, but it would probably make sense to add more canned files in the no-checksum format generated by the old writer and read them with the new reader, the same way HFile v1 compatibility is ensured. I don't mind keeping the old writer code around in the unit test, but I think it is best to remove as much code from that legacy writer as possible (e.g. versatile API, toString, etc.) and only leave the parts necessary to generate the file for testing. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164 Long line src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83 Can this be made private if it is not accessed outside of this class? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78 Use ALL_CAPS for constants src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 There seems to be a lot of copy-and-paste from the old HFileBlock code here. Is there a way to reduce that? I think we also need to create some canned old-format HFiles (using the old code) and read them with the new reader code as part of the test. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 Make this class final. Also, it would make sense to strip this class down as much as possible to maintain the bare minimum of code required to test compatibility (if you have not done that already). src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Do we ever use this function? src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201678#comment-13201678 ] Phabricator commented on HBASE-5074: mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Some more comments. I am still concerned about the copy-paste stuff in backwards-compatibility checking. Is there a way to minimize that? I also mentioned this in the comments below, but it would probably make sense to add more canned files in the no-checksum format generated by the old writer and read them with the new reader, the same way HFile v1 compatibility is ensured. I don't mind keeping the old writer code around in the unit test, but I think it is best to remove as much code from that legacy writer as possible (e.g. versatile API, toString, etc.) and only leave the parts necessary to generate the file for testing. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164 Long line src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83 Can this be made private if it is not accessed outside of this class? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78 Use ALL_CAPS for constants src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 There seems to be a lot of copy-and-paste from the old HFileBlock code here. Is there a way to reduce that? I think we also need to create some canned old-format HFiles (using the old code) and read them with the new reader code as part of the test. src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 Make this class final. Also, it would make sense to strip this class down as much as possible to maintain the bare minimum of code required to test compatibility (if you have not done that already). src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Do we ever use this function? src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300 Is 0 the minor version with no checksums? If so, please replace it with a constant for readability. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5341) HBase build artifact should include security code by defult
HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5341: - Component/s: security build HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201720#comment-13201720 ] Enis Soztutar commented on HBASE-5341: -- I can provide a patch, if we agree on this. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5342) Grant/Revoke global permissions
Grant/Revoke global permissions --- Key: HBASE-5342 URL: https://issues.apache.org/jira/browse/HBASE-5342 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar HBASE-3025 introduced simple ACLs based on coprocessors. It defines global/table/cf/cq level permissions. However, there is no way to grant/revoke global level permissions, other than the hbase.superuser conf setting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201818#comment-13201818 ] Enis Soztutar commented on HBASE-5341: -- Also, there is no secure artifact in the maven repo, so depending on when 0.92.1 is cut out, we might want to push 0.92.0-security as well. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5343) Access control API in HBaseAdmin.java
Access control API in HBaseAdmin.java --- Key: HBASE-5343 URL: https://issues.apache.org/jira/browse/HBASE-5343 Project: HBase Issue Type: Improvement Components: client, coprocessors, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar To use the access control mechanism added in HBASE-3025, users should either use the shell interface, or use the coprocessor API directly, which is not very user friendly. We can add grant/revoke/user_permission commands similar to the shell interface to HBaseAdmin assuming HBASE-5341 is in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201841#comment-13201841 ] Gregory Chanan commented on HBASE-5317: --- @Ted: FSUtils.getTableDirs() excludes a specific list of directories. Specifically: {code} Arrays.asList(new String[]{ HREGION_LOGDIR_NAME, HREGION_OLDLOGDIR_NAME, CORRUPT_DIR_NAME, Bytes.toString(META_TABLE_NAME), Bytes.toString(ROOT_TABLE_NAME), SPLIT_LOGDIR_NAME })); {code} So if the MiniMRCluster creates a target directory, it will be returned. Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201848#comment-13201848 ] Zhihong Yu commented on HBASE-5317: --- Right. Can we find out why this target directory was created ? Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.92.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3134) [replication] Add the ability to enable/disable streams
[ https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201862#comment-13201862 ] Jean-Daniel Cryans commented on HBASE-3134: --- We can't hit ZK every time we replicate in order to see what the state is, each RS instead should have a watcher and the check should be done locally. The rest looks good, thanks a lot for working on this. [replication] Add the ability to enable/disable streams --- Key: HBASE-3134 URL: https://issues.apache.org/jira/browse/HBASE-3134 Project: HBase Issue Type: New Feature Components: replication Reporter: Jean-Daniel Cryans Assignee: Teruyoshi Zenmyo Priority: Minor Labels: replication Fix For: 0.94.0 Attachments: HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch This jira was initially in the scope of HBASE-2201, but was pushed out since it has low value compared to the required effort (and when want to ship 0.90.0 rather soonish). We need to design a way to enable/disable replication streams in a determinate fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201867#comment-13201867 ] Zhihong Yu commented on HBASE-5341: --- If we remove the remove the maven 'security' profile, only secure HBase artifacts would be built, right ? Since most users wouldn't be using secure HBase features, I think this might introduce confusion for them. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5333) Introduce Memstore backpressure for writes
[ https://issues.apache.org/jira/browse/HBASE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201880#comment-13201880 ] Jean-Daniel Cryans commented on HBASE-5333: --- I've done some brainstorming with Stack and the result was HBASE-5162. Introduce Memstore backpressure for writes Key: HBASE-5333 URL: https://issues.apache.org/jira/browse/HBASE-5333 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Currently if the memstore/flush/compaction cannot keep up with the writeload, we block writers up to hbase.hstore.blockingWaitTime milliseconds (default is 9). Would be nice if there was a concept of a soft backpressure that slows writing clients gracefully *before* we reach this condition. From the log: 2012-02-04 00:00:06,963 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region table,,1328313512779.c2761757621ddf8fb78baf5288d71271. has too many store files; delaying flush up to 9ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201881#comment-13201881 ] Enis Soztutar commented on HBASE-5341: -- The only artifact build will be plain 0.92.1 or 0.94 (no -security appended). But this will include security related code. It's like Hadoop-1.0.0 which includes security related codes in one artifact. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default
[ https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201884#comment-13201884 ] Jean-Daniel Cryans commented on HBASE-5267: --- I'm +1 with the patch, but eventually I'd still like to see something in the book about it. Add a configuration to disable the slab cache by default Key: HBASE-5267 URL: https://issues.apache.org/jira/browse/HBASE-5267 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Li Pi Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt From what I commented at the tail of HBASE-4027: {quote} I changed the release note, the patch doesn't have a hbase.offheapcachesize configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize (which is actually a big problem when you consider this: http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). {quote} We need to add hbase.offheapcachesize and set it to false by default. Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201885#comment-13201885 ] Zhihong Yu commented on HBASE-5341: --- The certification for 0.92.0 was for insecure HBase artifact. If we only produce one secure artifact, would certification process change ? HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201898#comment-13201898 ] Enis Soztutar commented on HBASE-5341: -- Sorry, I did not understand what you are referring to with the certification process. Do you mean voting for the RC, signing the release, etc? HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default
[ https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201901#comment-13201901 ] Zhihong Yu commented on HBASE-5267: --- I was about to create a sub-task but found that '3.2.10. Experimental off-heap cache' points to: http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ Would an update of that blog suffice ? Add a configuration to disable the slab cache by default Key: HBASE-5267 URL: https://issues.apache.org/jira/browse/HBASE-5267 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Li Pi Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt From what I commented at the tail of HBASE-4027: {quote} I changed the release note, the patch doesn't have a hbase.offheapcachesize configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize (which is actually a big problem when you consider this: http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). {quote} We need to add hbase.offheapcachesize and set it to false by default. Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201907#comment-13201907 ] Zhihong Yu commented on HBASE-5341: --- Yes. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default
[ https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201909#comment-13201909 ] Jean-Daniel Cryans commented on HBASE-5267: --- Pointing to it might be a good option, along with a line or two on how to use it. Add a configuration to disable the slab cache by default Key: HBASE-5267 URL: https://issues.apache.org/jira/browse/HBASE-5267 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Li Pi Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt From what I commented at the tail of HBASE-4027: {quote} I changed the release note, the patch doesn't have a hbase.offheapcachesize configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize (which is actually a big problem when you consider this: http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). {quote} We need to add hbase.offheapcachesize and set it to false by default. Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5267) Add a configuration to disable the slab cache by default
[ https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5267: -- Attachment: 5267v4.txt Patch v4 adds the new config parameter to src/docbkx/upgrading.xml Add a configuration to disable the slab cache by default Key: HBASE-5267 URL: https://issues.apache.org/jira/browse/HBASE-5267 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Li Pi Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt, 5267v4.txt From what I commented at the tail of HBASE-4027: {quote} I changed the release note, the patch doesn't have a hbase.offheapcachesize configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize (which is actually a big problem when you consider this: http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). {quote} We need to add hbase.offheapcachesize and set it to false by default. Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201917#comment-13201917 ] Enis Soztutar commented on HBASE-5341: -- I don't see a reason for changing the release process. The vote for 0.92.0 release included both the plain and secure artifacts, see http://comments.gmane.org/gmane.comp.java.hadoop.hbase.devel/25671 HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201928#comment-13201928 ] Zhihong Yu commented on HBASE-5341: --- If you search for the voting process by entering the following in http://search-hadoop.com: 'ANN: The fifth hbase 0.92.0 release candidate is available for download' one can hardly tell whether the voters tested with secure hbase tar ball. If we produce one artifact (I think we should), some voters have to test security features before we declare new release. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default
[ https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201964#comment-13201964 ] Hadoop QA commented on HBASE-5267: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513543/5267v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/909//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/909//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/909//console This message is automatically generated. Add a configuration to disable the slab cache by default Key: HBASE-5267 URL: https://issues.apache.org/jira/browse/HBASE-5267 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Li Pi Priority: Blocker Fix For: 0.94.0, 0.92.1 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt, 5267v4.txt From what I commented at the tail of HBASE-4027: {quote} I changed the release note, the patch doesn't have a hbase.offheapcachesize configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize (which is actually a big problem when you consider this: http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). {quote} We need to add hbase.offheapcachesize and set it to false by default. Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201983#comment-13201983 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl elaborate more on this comment? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 I think it is better to keep the compatibility code separate from existing live-test code. That way, it is guaranteed to never change. is there any other existing unit test that keeps a version1 file to run unit tests against? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 I did not strip it down, just so that it remains as it was earlier. This is for backward-compatibility, so isn't it better to keep as it was? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Was useful while testing, but I will get rid of it. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201988#comment-13201988 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl elaborate more on this comment? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76 I think it is better to keep the compatibility code separate from existing live-test code. That way, it is guaranteed to never change. is there any other existing unit test that keeps a version1 file to run unit tests against? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365 I did not strip it down, just so that it remains as it was earlier. This is for backward-compatibility, so isn't it better to keep as it was? src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800 Was useful while testing, but I will get rid of it. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201990#comment-13201990 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 But CRC32C is not installed by default. You would need hadoop 2.0 (not yet released) to get that. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201997#comment-13201997 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4852 --- I tried to use the command line tool to compress an HLog written by 0.92 and got the follwoing: Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192) at org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104) at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64) Also, if you use the command line tool with no arguments, it should print its help (right now it prints an IndexOutOfBOundsException). I'll try again with an hlog written by trunk - I'm guessing the hlog serialization version might have changed or something. - Todd On 2012-01-24 22:29:18, Li Pi wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2740/ bq. --- bq. bq. (Updated 2012-01-24 22:29:18) bq. bq. bq. Review request for hbase, Eli Collins and Todd Lipcon. bq. bq. bq. Summary bq. --- bq. bq. HLog compression. Has unit tests and a command line tool for compressing/decompressing. bq. bq. bq. This addresses bug HBase-4608. bq. https://issues.apache.org/jira/browse/HBase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2740/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Li bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202000#comment-13202000 ] Enis Soztutar commented on HBASE-5341: -- Agreed. Conceptually, security related features are not very different that other features. They can be included in the code base, disabled by default, and marked as experimental if not tested well. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202024#comment-13202024 ] Phabricator commented on HBASE-5074: tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see PureJavaCrc32 in hadoop 1.0 either. I think it would be nice to default to the best checksum class. src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would hbase.hstore.checksum.algo be a better name for this config parameter ? REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202025#comment-13202025 ] Phabricator commented on HBASE-5074: tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see PureJavaCrc32 in hadoop 1.0 either. I think it would be nice to default to the best checksum class. src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would hbase.hstore.checksum.algo be a better name for this config parameter ? REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202026#comment-13202026 ] Phabricator commented on HBASE-5292: mbautin has commented on the revision [jira] [HBASE-5292] [89-fb] Prevent counting getSize on compactions. @zhiqiu: does this problem exist in the open-source HBase trunk? If so, could you please port this patch to trunk? If this is not applicable to trunk, could you please set the JIRA status to resolved? Thanks! REVISION DETAIL https://reviews.facebook.net/D1527 getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Attachments: D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5344) [89-fb] Scan unassigned region directory on master failover
[89-fb] Scan unassigned region directory on master failover --- Key: HBASE-5344 URL: https://issues.apache.org/jira/browse/HBASE-5344 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin In case the master dies after a regionserver writes region state as OPENED or CLOSED in ZK but before the update is received by master and written to meta, the new master that comes up has to pick up the region state from ZK and write it to meta. Otherwise we can get multiply-assigned regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5344) [89-fb] Scan unassigned region directory on master failover
[ https://issues.apache.org/jira/browse/HBASE-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5344: --- Attachment: D1605.1.patch mbautin requested code review of [jira] [HBASE-5344] [89-fb] Scan unassigned region directory on master failover. Reviewers: Kannan, Karthik, Liyin, JIRA, stack In case the master dies after a regionserver writes region state as OPENED or CLOSED in ZK but before the update is received by master and written to meta, the new master that comes up has to pick up the region state from ZK and write it to meta. Otherwise we can get multiply-assigned regions. The current solution tries to reassign the root region if it is unassigned but does not implement a work-around if META regions are missing. Also, it currently heavily relies on direct scanning of regionservers (reading regionserver list from ZK and doing an RPC on each regionserver to get the list of online regions). We were already doing that in master failover, but I am making it parallel here. TEST PLAN Unit tests, dev cluster, dark launch with killing regionservers and master REVISION DETAIL https://reviews.facebook.net/D1605 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionEventData.java src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java src/main/java/org/apache/hadoop/hbase/master/DirectRegionServerScanner.java src/main/java/org/apache/hadoop/hbase/master/HMaster.java src/main/java/org/apache/hadoop/hbase/master/ProcessRegionOpen.java src/main/java/org/apache/hadoop/hbase/master/RegionManager.java src/main/java/org/apache/hadoop/hbase/master/RootScanner.java src/main/java/org/apache/hadoop/hbase/master/ServerManager.java src/main/java/org/apache/hadoop/hbase/master/ZKUnassignedWatcher.java src/main/java/org/apache/hadoop/hbase/master/handler/MasterOpenRegionHandler.java src/test/java/org/apache/hadoop/hbase/master/TestRegionStateOnMasterFailure.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3429/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. [89-fb] Scan unassigned region directory on master failover --- Key: HBASE-5344 URL: https://issues.apache.org/jira/browse/HBASE-5344 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1605.1.patch In case the master dies after a regionserver writes region state as OPENED or CLOSED in ZK but before the update is received by master and written to meta, the new master that comes up has to pick up the region state from ZK and write it to meta. Otherwise we can get multiply-assigned regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202032#comment-13202032 ] jirapos...@reviews.apache.org commented on HBASE-4608: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 --- I tried the compression tool on a log created by YCSB in load mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think. I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/2740/#comment10650 invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/2740/#comment10651 I think the better way of expressing this usage would be: WALCompressor [-u | -c] input output -u - uncompresses the input log -c - compresses the output log Exactly one of -u or -c must be specified src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java https://reviews.apache.org/r/2740/#comment10649 this code doesn't work properly. Here's what you want to do: Configuration conf = new Configuration(); FileSystem fs = path.getFileSystem(conf); - Todd On 2012-01-24 22:29:18, Li Pi wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2740/ bq. --- bq. bq. (Updated 2012-01-24 22:29:18) bq. bq. bq. Review request for hbase, Eli Collins and Todd Lipcon. bq. bq. bq. Summary bq. --- bq. bq. HLog compression. Has unit tests and a command line tool for compressing/decompressing. bq. bq. bq. This addresses bug HBase-4608. bq. https://issues.apache.org/jira/browse/HBase-4608 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf bq. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2740/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Li bq. bq. HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current
[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202050#comment-13202050 ] Phabricator commented on HBASE-5292: zhiqiu has commented on the revision [jira] [HBASE-5292] [89-fb] Prevent counting getSize on compactions. @mbautin Sure. I'll port it to open-source trunk right now. Thank you so much for reminding me this. :D REVISION DETAIL https://reviews.facebook.net/D1527 getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Attachments: D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5301) Some RegionServer metrics have really confusing names
[ https://issues.apache.org/jira/browse/HBASE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated HBASE-5301: Component/s: metrics Description: Mikael Sitruk commented on this back in Nov 2011 and after looking at this I completely agree with him. For example, flushSize_avg_time makes no sense. flushSize is in bytes, so is this the average flush size? Or the average time per flush? In which case, why not call the measure flush_avg_time. But to add to the confusion there is already a flushTime_avg_time metric. There is also flushTime_num_ops and flushSize_num_ops that are confusing. Is the former the number of flushes? In which case, why have time in the metric name? On 11/22/11 5:23 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: Hi I have enabled metrics on Hbase cluster (0.90.1), and mapped the metrics to 3 categories (missing, Present but not documented/Incomplete documentation,Ok) according to their status in the book ( http://hbase.apache.org/book.html#hbase_metrics). Is it possible to udpate the book accordingly? It seems also that rpc metrics are not documented at all. And now some questions on the metrics: I can see some metrics present a num_ops and avg_time suffix (like rpc) but it seems that for certain metrics is it totally unclear (to me at least) or their name is missleading - for example what means compactionTime_avg_time/compactionTime_num_ops? or flushSize_avg_time and flushSize_num_ops? I mean I would have understood compaction_avg_time and flushSize or flush_avg_time. was: Mikael Sitruk commented on this back in Nov 2011 and after looking at this I completely agree with him. For example, flushSize_avg_time makes no sense. flushSize is in bytes, so is this the average flush size? Or the average time per flush? In which case, why not call the measure flush_avg_time. But to add to the confusion there is already a flushTime_avg_time metric. There is also flushTime_num_ops and flushSize_num_ops that are confusing. Is the former the number of flushes? In which case, why have time in the metric name? On 11/22/11 5:23 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: Hi I have enabled metrics on Hbase cluster (0.90.1), and mapped the metrics to 3 categories (missing, Present but not documented/Incomplete documentation,Ok) according to their status in the book ( http://hbase.apache.org/book.html#hbase_metrics). Is it possible to udpate the book accordingly? It seems also that rpc metrics are not documented at all. And now some questions on the metrics: I can see some metrics present a num_ops and avg_time suffix (like rpc) but it seems that for certain metrics is it totally unclear (to me at least) or their name is missleading - for example what means compactionTime_avg_time/compactionTime_num_ops? or flushSize_avg_time and flushSize_num_ops? I mean I would have understood compaction_avg_time and flushSize or flush_avg_time. Labels: metrics (was: ) Some RegionServer metrics have really confusing names - Key: HBASE-5301 URL: https://issues.apache.org/jira/browse/HBASE-5301 Project: HBase Issue Type: Bug Components: metrics Reporter: Doug Meil Labels: metrics Mikael Sitruk commented on this back in Nov 2011 and after looking at this I completely agree with him. For example, flushSize_avg_time makes no sense. flushSize is in bytes, so is this the average flush size? Or the average time per flush? In which case, why not call the measure flush_avg_time. But to add to the confusion there is already a flushTime_avg_time metric. There is also flushTime_num_ops and flushSize_num_ops that are confusing. Is the former the number of flushes? In which case, why have time in the metric name? On 11/22/11 5:23 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: Hi I have enabled metrics on Hbase cluster (0.90.1), and mapped the metrics to 3 categories (missing, Present but not documented/Incomplete documentation,Ok) according to their status in the book ( http://hbase.apache.org/book.html#hbase_metrics). Is it possible to udpate the book accordingly? It seems also that rpc metrics are not documented at all. And now some questions on the metrics: I can see some metrics present a num_ops and avg_time suffix (like rpc) but it seems that for certain metrics is it totally unclear (to me at least) or their name is missleading - for example what means compactionTime_avg_time/compactionTime_num_ops? or flushSize_avg_time and flushSize_num_ops? I mean I would have understood compaction_avg_time and flushSize or flush_avg_time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202057#comment-13202057 ] Anoop Sam John commented on HBASE-2038: --- Hi Alex, Thanks for your reply... Yes I had seen your past comment..I am checking the trunk code for the co processor for this work as of now... What is your comment on my first comment, that the HRegionServer next(final long scannerId, int nbRows) calls the co processor preScannerNext() by passing the RegionScanner. On this we can not make a seek().. Thanks Anoop Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5321) this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90.
[ https://issues.apache.org/jira/browse/HBASE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5321. --- Resolution: Fixed Committed to 0.90. this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90. Key: HBASE-5321 URL: https://issues.apache.org/jira/browse/HBASE-5321 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5321.patch In HBASE-5160 we do not wait for TM to assign the regions after the first RS comes online. After doing this the variable this.allRegionServersOffline needs to be reset which is not done in 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master
[ https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5323: -- Attachment: HBASE-5323.patch Patch for 0.90 Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master Key: HBASE-5323 URL: https://issues.apache.org/jira/browse/HBASE-5323 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7 Attachments: HBASE-5323.patch, HBASE-5323.patch We know that while parsing the HLog we expect the proper length from HDFS. In WALReaderFSDataInputStream {code} assert(realLength = this.length); {code} We are trying to come out if the above condition is not satisfied. But if SSH.splitLog() gets this problem then it lands in the run method of EventHandler. This kills the SSH thread and so further assignment does not happen. If ROOT and META are to be assigned they cannot be. I think in this condition we abort the master by catching such exceptions. Please do suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5343) Access control API in HBaseAdmin.java
[ https://issues.apache.org/jira/browse/HBASE-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202081#comment-13202081 ] Gary Helmling commented on HBASE-5343: -- Adding coprocessor specific methods to {{HBaseAdmin}} completely undermines the purpose of coprocessors as optionally enabled extensions, and fails to scale as features are added. Having {{HBaseAdmin}} be a jumble of methods related to specific coprocessors is not very user friendly either. Security usage requires that {{SecureRpcEngine}} be loaded and that {{AccessController}} be enabled. Yes, configuring these components is more complicated than it needs to be right now. But providing interfaces to these two optional components as a permanent part of the client-facing API presented by {{HBaseAdmin}} is not the solution. If {{AccessControllerProtocol}} is too difficult to work with, then I think we would be better off with a simple client helper, like a {{SecurityClient}} class similar to the {{Constraints}} helper that was implemented for the constraints coprocessor. Access control API in HBaseAdmin.java --- Key: HBASE-5343 URL: https://issues.apache.org/jira/browse/HBASE-5343 Project: HBase Issue Type: Improvement Components: client, coprocessors, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar To use the access control mechanism added in HBASE-3025, users should either use the shell interface, or use the coprocessor API directly, which is not very user friendly. We can add grant/revoke/user_permission commands similar to the shell interface to HBaseAdmin assuming HBASE-5341 is in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing
[ https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202088#comment-13202088 ] Lars Hofhansl commented on HBASE-2038: -- Unfortunately there is no seeking in the coprocessors, yet. They work more like a filter of a real scan. Seeking is done one level (or two actually) level deeper. Seeking is done in the StoreScanners, coprocessors see RegionScanners. It is not entirely clear to me where to hook this up in that API. It might be possible to provide a custom filter to do that. Filters operate at the storescanner level, and so can (and do) provide seek hints to the calling scanner. Coprocessors: Region level indexing --- Key: HBASE-2038 URL: https://issues.apache.org/jira/browse/HBASE-2038 Project: HBase Issue Type: New Feature Components: coprocessors Reporter: Andrew Purtell Priority: Minor HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a good goalpost for coprocessor environment design -- there should be enough of it so region level indexing can be reimplemented as a coprocessor without any loss of functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202091#comment-13202091 ] Gary Helmling commented on HBASE-5341: -- This would break the ability to compile HBase 0.92+ against Hadoop releases without security. Even though we currently compile against 1.0 by default, we haven't block the ability to compile against previous versions. So this would be a change, especially if there's anyone out there running on builds of the 0.20-append branch. We've also been discussing moving in the direction of a modular build (see HBASE-4336). The current security/ tree is practically a module already, just lacking it's own pom.xml. Moving security/ back up into src/ would be a step back into the opposite direction, keeping us with a monolithic release. This may make the packaging slightly simpler, but it still won't make it any easier to test out all the combinations of optional components. The security profile does currently use it's own configuration for testing, so at least we get execution of the full test suite using {{SecureRpcEngine}}. The full test suite is really overkill for testing. We could use just a good set of RPC focused tests if we had them. But in my opinion that kind of focused testing would be easier to handle in a security module than are part of a monolithic build. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201624#comment-13201624 ] Lars Hofhansl edited comment on HBASE-5229 at 2/7/12 5:42 AM: -- bq. On 2012-02-06 20:24:17, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4152 bq. https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152 bq. bq. What if rm contains more than one Mutation ? Hopefully rm does contain more than one Mutation, otherwise using this API is pointless. :) It is guranteed, though, that all Mutations are for this single row. Do you see a concern? bq. On 2012-02-06 20:24:17, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4171 bq. https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171 bq. bq. else is not needed considering exception is thrown on line 4170. Right. But this makes the flow clear. Personally I am not a big fan of having to look through code and having to piece together the control flow by tracking exceptions and return statements. I don't mind changing it, though. - Lars was (Author: jirapos...@reviews.apache.org): bq. On 2012-02-06 20:24:17, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4152 bq. https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152 bq. bq. What if rm contains more than one Mutation ? Hopefully rm does contain more than one Mutation, otherwise using this API is pointless. :) It is guranteed, though, that all Mutations are for this single row. Do you see a concern? bq. On 2012-02-06 20:24:17, Ted Yu wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 4171 bq. https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171 bq. bq. else is not needed considering exception is thrown on line 4170. Right. But this makes the flow clear. Personally I am not a big fan of having to look through code and having to piece together the control flow by tracking exceptions and return statements. I don't mind changing it, though. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4844 --- On 2012-02-06 19:51:58, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3748/ bq. --- bq. bq. (Updated 2012-02-06 19:51:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This builds on HBASE-3584, HBASE-5203, and HBASE-5304. bq. bq. Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). bq. At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. bq. bq. Obviously this is an advanced features and this prominently called out in the Javadoc. bq. bq. bq. This addresses bug HBASE-5229. bq. https://issues.apache.org/jira/browse/HBASE-5229 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1241120 bq.
[jira] [Issue Comment Edited] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201556#comment-13201556 ] Lars Hofhansl edited comment on HBASE-5229 at 2/7/12 5:42 AM: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4844 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/3748/#comment10643 What if rm contains more than one Mutation ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/3748/#comment10642 else is not needed considering exception is thrown on line 4170. - Ted was (Author: jirapos...@reviews.apache.org): --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3748/#review4844 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/3748/#comment10643 What if rm contains more than one Mutation ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/3748/#comment10642 else is not needed considering exception is thrown on line 4170. - Ted On 2012-02-06 19:51:58, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3748/ bq. --- bq. bq. (Updated 2012-02-06 19:51:58) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This builds on HBASE-3584, HBASE-5203, and HBASE-5304. bq. bq. Multiple Rows can be locked and applied atomically as long as the application ensures that all rows reside in the same Region (by presplitting or a custom RegionSplitPolicy). bq. At SFDC we can use this to colocate subsets of a tenant's data and allow atomic operations over these subsets. bq. bq. Obviously this is an advanced features and this prominently called out in the Javadoc. bq. bq. bq. This addresses bug HBASE-5229. bq. https://issues.apache.org/jira/browse/HBASE-5229 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 1241120 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 1241120 bq. bq. Diff: https://reviews.apache.org/r/3748/diff bq. bq. bq. Testing bq. --- bq. bq. Tests added to TestFromClientSide and TestAtomicOperation bq. bq. bq. Thanks, bq. bq. Lars bq. bq. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for
[jira] [Commented] (HBASE-5342) Grant/Revoke global permissions
[ https://issues.apache.org/jira/browse/HBASE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202093#comment-13202093 ] Gary Helmling commented on HBASE-5342: -- Some of the building blocks for this are already in place. It shouldn't be too difficult to fill in the missing pieces. Would be great to see this completed. Grant/Revoke global permissions --- Key: HBASE-5342 URL: https://issues.apache.org/jira/browse/HBASE-5342 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar HBASE-3025 introduced simple ACLs based on coprocessors. It defines global/table/cf/cq level permissions. However, there is no way to grant/revoke global level permissions, other than the hbase.superuser conf setting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202097#comment-13202097 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 my choice would be to make java's crc32 be the default. PureJavacrc32 is compatible with java's crc32. However, purejavacrc32C is not compatible with either of these. Although PureJavaCRC32 is not part of 1.0, if and when you move to hadoop 2.0, you will automatically get the better performant algorithm via Purejavacrc32. For the adventurous, one can manually pull in PureJavaCRC32C inot one's own hbase deployment by explicitly setting hbase.hstore.checksum.algorithm to be CRC32C. Does that sound reasonable? src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 sounds good, will make this change. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5292: --- Attachment: D1617.1.patch zhiqiu requested code review of [jira] [HBASE-5292] Prevent counting getSize on compactions. Reviewers: Kannan, mbautin, Liyin, JIRA Added two separate metrics for both get() and next(). This is done by refactoring on internal next() API. To be more specific, only Get.get() and ResultScanner.next() passes the metric name (getsize and nextsize repectively) to HRegion::RegionScanner::next(ListKeyValue, String) This will eventually hit StoreScanner()::next((ListKeyValue, int, String) where the metrics are counted. And their call paths are: 1) Get HTable::get(final Get get) = HRegionServer::get(byte [] regionName, Get get) = HRegion::get(final Get get, final Integer lockid) = HRegion::get(final Get get) [pass METRIC_GETSIZE to the callee] = HRegion::RegionScanner::next(ListKeyValue outResults, String metric) = HRegion::RegionScanner::next(ListKeyValue outResults, int limit, String metric) = HRegion::RegionScanner::nextInternal(int limit, String metric) = KeyValueHeap::next(ListKeyValue result, int limit, String metric) = StoreScanner::next(ListKeyValue outResult, int limit, String metric) 2) Next HTable::ClientScanner::next() = ScannerCallable::call() = HRegionServer::next(long scannerId) = HRegionServer::next(final long scannerId, int nbRows) [pass METRIC_NEXTSIZE to the callee] = HRegion::RegionScanner::next(ListKeyValue outResults, String metric) = HRegion::RegionScanner::next(ListKeyValue outResults, int limit, String metric) = HRegion::RegionScanner::nextInternal(int limit, String metric) = KeyValueHeap::next(ListKeyValue result, int limit, String metric) = StoreScanner::next(ListKeyValue outResult, int limit, String metric) Task ID: #898948 Blame Rev: TEST PLAN 1. Passed unit tests. 2. Created a testcase TestRegionServerMetrics::testGetNextSize to guarantee: * Get/Next contributes to getsize/nextsize metrics * Both getsize/nextsize are per Column Family * Flush/compaction won't affect these two metrics Revert Plan: Tags: REVISION DETAIL https://reviews.facebook.net/D1617 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/regionserver/InternalScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3441/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Attachments: D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202100#comment-13202100 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Ted:I forgot to state that one can change the default checksum algorithm anytime. No disk format upgrade is necessary. Each hfile stores the checksum algorithm that is used to store data inside it. If today u use CRC32 and the tomorrow you change the configuration setting to CRC32C, then new files that are generated (as part of memstore flushes and compactions) will start using CRC32C while older files will continue to be verified via CRC32 algorithm. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202099#comment-13202099 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Ted:I forgot to state that one can change the default checksum algorithm anytime. No disk format upgrade is necessary. Each hfile stores the checksum algorithm that is used to store data inside it. If today u use CRC32 and the tomorrow you change the configuration setting to CRC32C, then new files that are generated (as part of memstore flushes and compactions) will start using CRC32C while older files will continue to be verified via CRC32 algorithm. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202103#comment-13202103 ] stack commented on HBASE-5341: -- bq. This would break the ability to compile HBase 0.92+ against Hadoop releases without security. Perhaps we could entertain breaking this for 0.94.0? i.e. saying we only run on hadoops w/ security? (CDH3 has it? What doesn't that we want to run on by the time 0.94.0 is out). On modularization, yes, if hbase-4336 is done soon, security is a natural. Otherwise, we should do as Enis suggests. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.
[ https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202105#comment-13202105 ] Lars Hofhansl commented on HBASE-5229: -- It's even simpler. A coprocessor endpoint has access to its region. If I rename internalMutate from my patch to mutateRowsWithLocks(ListMutations mutations, SetString rowsToLock) and make it public in HRegion, it can be called from a coprocessor endpoint. It would not exposed in HRegionInterface, HRegionServer, RegionServerServices, or the HTableInterfaces. Explore building blocks for multi-row local transactions. --- Key: HBASE-5229 URL: https://issues.apache.org/jira/browse/HBASE-5229 Project: HBase Issue Type: New Feature Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt HBase should provide basic building blocks for multi-row local transactions. Local means that we do this by co-locating the data. Global (cross region) transactions are not discussed here. After a bit of discussion two solutions have emerged: 1. Keep the row-key for determining grouping and location and allow efficient intra-row scanning. A client application would then model tables as HBase-rows. 2. Define a prefix-length in HTableDescriptor that defines a grouping of rows. Regions will then never be split inside a grouping prefix. #1 is true to the current storage paradigm of HBase. #2 is true to the current client side API. I will explore these two with sample patches here. Was: As discussed (at length) on the dev mailing list with the HBASE-3584 and HBASE-5203 committed, supporting atomic cross row transactions within a region becomes simple. I am aware of the hesitation about the usefulness of this feature, but we have to start somewhere. Let's use this jira for discussion, I'll attach a patch (with tests) momentarily to make this concrete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans
[ https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202108#comment-13202108 ] stack commented on HBASE-5325: -- @Enis metrics2 seems a bit out there for us (hadoop 0.23?). We want to run on 0.23 and 1.0 and 2.0, etc., so it'd be a while before we could lean on it. metrics2 has facility that would help? (I've not studied it). @Hitesh Regards I am still digging into jmx internals but I could not find anything which mentions it as an option for pushing information., even if there was a means (IIRC there is but am likely off), I think we'd have master pulling. bq. Having the master pull information from all region servers using jmx ( or any other point to point protocol ) would likely be a bad idea from a performance point of view. Currently every regionserver sends status every (configurable) second. Its a fat Writable serialization of each regionservers counters and current state. IIRC, this mechanism runs mostly independent and beside our metrics (so there'll be Writable serialization of regionstate and if something like tsdb is running, there'll be a JMX serialization of server stating happening too). Would be an improvement if we did metrics reporting one way only if possible. bq. Also, was your intention to have the HMaster be a metric aggregator for the RegionServers' metrics? It does this now for key stats. bq. I still need to look at nesting of mbeans from various components and also need to look at the hbase code in more detail to see what kind of management options could be exposed via jmx. I'd be interested in what you think. We need to figure being able to config a running cluster; i.e. change Configuration values and have hbase notice. Having this go via jmx would likely be like taking the 'killarney road to dingle' as my grandma used to say (its shorter if you take the tralee road) so maybe jmx is read-only rather than 'management'. Expose basic information about the master-status through jmx beans --- Key: HBASE-5325 URL: https://issues.apache.org/jira/browse/HBASE-5325 Project: HBase Issue Type: Improvement Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch Similar to the Namenode and Jobtracker, it would be good if the hbase master could expose some information through mbeans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202124#comment-13202124 ] Jonathan Hsieh commented on HBASE-5341: --- I'd be happy if the tarball(s) for 0.92.1 just come out with the ./security directory in the correct place in tarball. I would think that would be the most expedient thing to do. (this is however, likely easier said than done) @Gary - does compiling against hdfs 1.0.0 and running this new hbase jar in non-secure mode against the append branch hdfs jar work? If it works, I'm not convinced compilation matters as much. I do most of my system testing of these release from jars compiled against hadoop 1.0.0 but running on top of a cdh3 hdfs version -- no problems. If the hbase binary doesn't work, then I agree that this a concern -- it would block shops locked into their own hdfs branches that don't support hadoop security. - using the module approach, the security stuff would be a separate jar that we could add to the classpath right? @Ted - There are plenty of pieces we've release that been haven't tested by many people. That said, regardless of how included, I can commit that we'll be testing the access control and security features. @Stack - Hadoop 1.0.0 and CDH3 both have security and append. The next HDFS it seems most folks are coalescing around is 0.23, which also has security and append. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4658) Put attributes are not exposed via the ThriftServer
[ https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4658: Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Put attributes are not exposed via the ThriftServer --- Key: HBASE-4658 URL: https://issues.apache.org/jira/browse/HBASE-4658 Project: HBase Issue Type: Bug Components: thrift Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: D1563.1.patch, D1563.1.patch, D1563.1.patch, D1563.2.patch, D1563.2.patch, D1563.2.patch, D1563.3.patch, D1563.3.patch, D1563.3.patch, ThriftPutAttributes1.txt The Put api also takes in a bunch of arbitrary attributes that an application can use to associate metadata with each put operation. This is not exposed via Thrift. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression
[ https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202127#comment-13202127 ] dhruba borthakur commented on HBASE-5313: - One option listed above is to keep all the keys in the beginning of the block and all the values in the end of the block. The keys will still be delta-encoded. The values can be lzo-compressed. any other ideas out there? Restructure hfiles layout for better compression Key: HBASE-5313 URL: https://issues.apache.org/jira/browse/HBASE-5313 Project: HBase Issue Type: Improvement Components: io Reporter: dhruba borthakur Assignee: dhruba borthakur A HFile block contain a stream of key-values. Can we can organize these kvs on the disk in a better way so that we get much greater compression ratios? One option (thanks Prakash) is to store all the keys in the beginning of the block (let's call this the key-section) and then store all their corresponding values towards the end of the block. This will allow us to not-even decompress the values when we are scanning and skipping over rows in the block. Any other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5343) Access control API in HBaseAdmin.java
[ https://issues.apache.org/jira/browse/HBASE-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202133#comment-13202133 ] Andrew Purtell commented on HBASE-5343: --- bq. We can add grant/revoke/user_permission commands similar to the shell interface to HBaseAdmin assuming HBASE-5341 is in. -1 This issue should be resolved as 'invalid'. Access control API in HBaseAdmin.java --- Key: HBASE-5343 URL: https://issues.apache.org/jira/browse/HBASE-5343 Project: HBase Issue Type: Improvement Components: client, coprocessors, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar To use the access control mechanism added in HBASE-3025, users should either use the shell interface, or use the coprocessor API directly, which is not very user friendly. We can add grant/revoke/user_permission commands similar to the shell interface to HBaseAdmin assuming HBASE-5341 is in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult
[ https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202135#comment-13202135 ] Andrew Purtell commented on HBASE-5341: --- If we drop support for 0.20.x then compilation issues mostly go away. It won't fail for lack of security APIs in Hadoop. I lean with Gary's view that security belongs in a (Maven) module. HBase build artifact should include security code by defult Key: HBASE-5341 URL: https://issues.apache.org/jira/browse/HBASE-5341 Project: HBase Issue Type: Improvement Components: build, security Affects Versions: 0.94.0, 0.92.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Hbase 0.92.0 was released with two artifacts, plain and security. The security code is built with -Psecurity. There are two tarballs, but only the plain jar in maven repo at repository.a.o. I see no reason to do a separate artifact for the security related code, since 0.92 already depends on secure Hadoop 1.0.0, and all of the security related code is not loaded by default. In this issue, I propose, we merge the code under /security to src/ and remove the maven profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira