[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
---

Attachment: D1521.3.patch

dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase 
block cache.
Reviewers: mbautin

  Many new goodies, thanks to the feedback from Mikhail and Todd. This completes
  my addressing all the current set of review comments. If somebody can 
re-review it
  again, that will be great.

  1. The bytesPerChecksum is configurable. One can set 
hbase.hstore.bytes.per.checksum
  in the config to set this. The default value is 16K. Similarly, one can set
  hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If
  PureJavaCRC32 algoritm is available in the classpath, then it is used, 
otherwise it falls back to using java.util.zip.CRC32. Each checksum value is 
assumed to be 4 bytes,
  it is currently not configurable (any comments here?). The reflection-method 
of
  creating checksum objects is reworked to incur much lower overhead.

  2. If a hbase-level crc check fails, then it falls back to using hdfs-level
  checksums for the next few reads (defalts to 100). After that, it will retry
  using hbase-level checksums. I picked 100 as the default so that even in the 
case
  of continuous hbase-checksum failures, the overhead for additionals iops is 
limited
  to 1%. Enahnced unit test to validate this behaviour.

  3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, 
added
  JMX metrics to record the number of times hbase-checksum verification 
failures occur.

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
---

Attachment: D1521.3.patch

dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase 
block cache.
Reviewers: mbautin

  Many new goodies, thanks to the feedback from Mikhail and Todd. This completes
  my addressing all the current set of review comments. If somebody can 
re-review it
  again, that will be great.

  1. The bytesPerChecksum is configurable. One can set 
hbase.hstore.bytes.per.checksum
  in the config to set this. The default value is 16K. Similarly, one can set
  hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If
  PureJavaCRC32 algoritm is available in the classpath, then it is used, 
otherwise it falls back to using java.util.zip.CRC32. Each checksum value is 
assumed to be 4 bytes,
  it is currently not configurable (any comments here?). The reflection-method 
of
  creating checksum objects is reworked to incur much lower overhead.

  2. If a hbase-level crc check fails, then it falls back to using hdfs-level
  checksums for the next few reads (defalts to 100). After that, it will retry
  using hbase-level checksums. I picked 100 as the default so that even in the 
case
  of continuous hbase-checksum failures, the overhead for additionals iops is 
limited
  to 1%. Enahnced unit test to validate this behaviour.

  3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, 
added
  JMX metrics to record the number of times hbase-checksum verification 
failures occur.

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5074:
--

Status: Patch Available  (was: Open)

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201200#comment-13201200
 ] 

Hadoop QA commented on HBASE-5074:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12513416/D1521.3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 76 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -133 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
  org.apache.hadoop.hbase.util.TestMergeTool
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles
  org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit
  org.apache.hadoop.hbase.io.hfile.TestHFileBlock
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/907//console

This message is automatically generated.

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-02-06 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201205#comment-13201205
 ] 

Anoop Sam John commented on HBASE-2038:
---

Hi Lars,
  I am also trying for a secondary index and I have seen the IHBase 
concept being good.. But we need this to be moved to coprocessor based so that 
the kernel code of HBase need not be different for the secondary index. IHBase 
makes the scan go through all the regions ( as u said ) but they will skip and 
seek to the later positions in the heap avoid so many possible data read from 
HDFS etc...
When I saw the current co processor, we call preScannerNext() from 
HRegionServer next(final long scannerId, int nbRows)  and pass the 
RegionScanner here to the co processor.  But as per the IHBase way, within the 
co processor we should be able to seek to the correct row where the indexed col 
val equals our value. But we can not do this as of now as RegionScanner seek() 
not there. 

Also this preScannerNext() will be called once before the actual next(final 
long scannerId, int nbRows) call happening on the region. Here as per the cache 
value at the client side the nbRows might be more than one. Now suppose this is 
nbRows=2 and in the region we have 2 rows one at some what in the middle part 
of an HFile and the other at another HFile. Now as per IHBase we should 1st 
seek to the 1st position of the row and after reading this data should seek to 
the next position. Now as per the current way of calling of preScannerNext() 
this wont be possible. So I think we might need some change in these area?  
What do u say?

Mean while what is your plan to continue with the way of IHBase storing the 
index in memory for each of the region or some change in this?

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5338) Add SKIP support to importtsv

2012-02-06 Thread Lars George (Created) (JIRA)
Add SKIP support to importtsv 
--

 Key: HBASE-5338
 URL: https://issues.apache.org/jira/browse/HBASE-5338
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars George
Priority: Trivial


It'd be nice to have support for SKIP mappings so that you can omit columns 
from the TSV during the import. For example

{code}
-Dimporttsv.columns=SKIP,HBASE_ROW_KEY,cf1:col1,cf1:col2,SKIP,SKIP,cf2:col1...
{code}

Or maybe HBASE_SKIP_COLUMN to be less ambiguous. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5339) Add support for compound keys to importtsv

2012-02-06 Thread Lars George (Created) (JIRA)
Add support for compound keys to importtsv
--

 Key: HBASE-5339
 URL: https://issues.apache.org/jira/browse/HBASE-5339
 Project: HBase
  Issue Type: Improvement
Reporter: Lars George




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5339) Add support for compound keys to importtsv

2012-02-06 Thread Lars George (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars George updated HBASE-5339:
---

Component/s: mapreduce
Description: 
Add support that you can combine some columns from the TSV with either a given 
separator, no separator, or with a custom row key generator class. Syntax could 
be:

{code}
-Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3
-Dimporttsv.rowkey.separator=-
{code}

Another option of course is using the custom mapper class and handle this 
there, but this also seems like a nice to have option, probably often covering 
the 80% this sort of thing is needed.
   Priority: Trivial  (was: Major)

 Add support for compound keys to importtsv
 --

 Key: HBASE-5339
 URL: https://issues.apache.org/jira/browse/HBASE-5339
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars George
Priority: Trivial

 Add support that you can combine some columns from the TSV with either a 
 given separator, no separator, or with a custom row key generator class. 
 Syntax could be:
 {code}
 -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3
 -Dimporttsv.rowkey.separator=-
 {code}
 Another option of course is using the custom mapper class and handle this 
 there, but this also seems like a nice to have option, probably often 
 covering the 80% this sort of thing is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5339) Add support for compound keys to importtsv

2012-02-06 Thread Lars George (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201255#comment-13201255
 ] 

Lars George commented on HBASE-5339:


Obviously, you can rearrange the compound key parts by using different 
HBASE_ROW_KEY_N, where N is user knowledge.

{code}
-Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_3,cf1:col1,cf2:col3,HBASE_ROW_KEY_2
{code}

 Add support for compound keys to importtsv
 --

 Key: HBASE-5339
 URL: https://issues.apache.org/jira/browse/HBASE-5339
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars George
Priority: Trivial

 Add support that you can combine some columns from the TSV with either a 
 given separator, no separator, or with a custom row key generator class. 
 Syntax could be:
 {code}
 -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3
 -Dimporttsv.rowkey.separator=-
 {code}
 Another option of course is using the custom mapper class and handle this 
 there, but this also seems like a nice to have option, probably often 
 covering the 80% this sort of thing is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations

2012-02-06 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201351#comment-13201351
 ] 

Nicolas Spiegelberg commented on HBASE-5335:


@Lars: the original idea was to allow users to arbitrarily set KVs in the 
HTableDescriptor and HColumnDescriptor, but make it so users know that what 
they're doing is not checked.  Need some sort of format to distinguish between 
reserved keywords and non-reserved (thinking of doing this on the client side). 
 As a config value becomes more well-known, we can enforce limitations like you 
stated.

I'd rather have this evolve by having a handful of users who want to set a 
config value, learn over the long-term that this is useful, and incrementally 
refactor the code to ease support for that config.  I don't want to get into a 
spot where we have to do a large refactor to support this feature  do 
extensive sanity checking, only to determine that we only need 20% of the 
config values.

 Dynamic Schema Configurations
 -

 Key: HBASE-5335
 URL: https://issues.apache.org/jira/browse/HBASE-5335
 Project: HBase
  Issue Type: New Feature
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
  Labels: configuration, schema

 Currently, the ability for a core developer to add per-table  per-CF 
 configuration settings is very heavyweight.  You need to add a reserved 
 keyword all the way up the stack  you have to support this variable 
 long-term if you're going to expose it explicitly to the user.  This has 
 ended up with using Configuration.get() a lot because it is lightweight and 
 you can tweak settings while you're trying to understand system behavior 
 [since there are many config params that may never need to be tuned].  We 
 need to add the ability to put  read arbitrary KV settings in the HBase 
 schema.  Combined with online schema change, this will allow us to safely 
 iterate on configuration settings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201354#comment-13201354
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4833
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10616

Do we need to be sorting rowsToLock?

I'm thinking of multiple concurrent mutateRows operation, trying to lock 
the same set of rows.

Perhaps, throwing IOException is going to prevent us from a situation where 
we end up with a deadlock. But, we still might want to sort it to ensure 
(better) progress (no livelock).


- Amitanand


On 2012-02-03 19:59:55, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-03 19:59:55)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1239953 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed 

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201365#comment-13201365
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--



bq.  On 2012-02-06 15:52:43, Amitanand Aiyer wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4212
bq.   https://reviews.apache.org/r/3748/diff/2/?file=72266#file72266line4212
bq.  
bq.   Do we need to be sorting rowsToLock?
bq.   
bq.   I'm thinking of multiple concurrent mutateRows operation, trying to 
lock the same set of rows.
bq.   
bq.   Perhaps, throwing IOException is going to prevent us from a 
situation where we end up with a deadlock. But, we still might want to sort it 
to ensure (better) progress (no livelock).

MutateRows sorts them (by using a TreeSet with Bytes.BYTES_COMPARATOR, for 
exactly this reason.
Maybe this should be called out here, by making the argument a SortedSet.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4833
---


On 2012-02-03 19:59:55, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-03 19:59:55)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1239953 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in 

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201400#comment-13201400
 ] 

Phabricator commented on HBASE-5074:


tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is 
not safe. See 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/:

  Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to 
org.apache.hadoop.hbase.util.HFileSystem
at 
org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we 
default to CRC32C ?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is 
needed.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we 
name this variable ctor ?

  Similar comment applies to other meth variables in this patch.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201401#comment-13201401
 ] 

Phabricator commented on HBASE-5074:


tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is 
not safe. See 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/:

  Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to 
org.apache.hadoop.hbase.util.HFileSystem
at 
org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we 
default to CRC32C ?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is 
needed.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we 
name this variable ctor ?

  Similar comment applies to other meth variables in this patch.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5074:
--

Comment: was deleted

(was: tedyu has commented on the revision [jira] [HBASE-5074] Support 
checksums in HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is 
not safe. See 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/:

  Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to 
org.apache.hadoop.hbase.util.HFileSystem
at 
org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we 
default to CRC32C ?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is 
needed.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we 
name this variable ctor ?

  Similar comment applies to other meth variables in this patch.

REVISION DETAIL
  https://reviews.facebook.net/D1521
)

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201408#comment-13201408
 ] 

Zhihong Yu commented on HBASE-5267:
---

@J-D:
Do you want to take a look at patch v3 ?

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201431#comment-13201431
 ] 

Zhihong Yu commented on HBASE-5317:
---

I don't see ClassNotFoundException in test output but the following may provide 
a clue:
{code}
2012-02-06 09:44:48,377 WARN  [main] mapreduce.JobSubmitter(139): Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
2012-02-06 09:44:48,380 WARN  [main] mapreduce.JobSubmitter(241): No job jar 
file set.  User classes may not be found. See Job or Job#setJar(String).
2012-02-06 09:44:51,163 WARN  [ContainersLauncher #0] 
nodemanager.DefaultContainerExecutor(192): Exit code from task is : 127
2012-02-06 09:44:51,165 WARN  [ContainersLauncher #0] 
launcher.ContainerLaunch(273): Container exited with a non-zero exit code 127
{code}

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Jai Kumar Singh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Kumar Singh updated HBASE-5166:
---

Status: Patch Available  (was: Open)

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Jai Kumar Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201433#comment-13201433
 ] 

Jai Kumar Singh commented on HBASE-5166:


Any comments ??

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201439#comment-13201439
 ] 

Zhihong Yu commented on HBASE-5166:
---

MultithreadedTableMapper misses Apache license

{code}
+while(!executor.isTerminated()){
+  // wait till all the threads are done
+}
{code}
We should put sleep() in the above loop and possibly limit the total duration 
of wait.

A new unit test should be added for MultithreadedTableMapper.
Please look at tests that use TableMapper.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201444#comment-13201444
 ] 

Zhihong Yu commented on HBASE-5317:
---

For HBASE-5317-v1.patch, I think we shouldn't simply catch TableExistsException.
We should add missing Configuration parameters in MRv2 so that there is no 
TableExistsException during test.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201460#comment-13201460
 ] 

Zhihong Yu commented on HBASE-5317:
---

Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201460#comment-13201460
 ] 

Zhihong Yu edited comment on HBASE-5317 at 2/6/12 6:47 PM:
---

Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2 (when hadoop.profile property 
carries value of 23).

  was (Author: zhi...@ebaysf.com):
Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2.
  
 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Gregory Chanan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201473#comment-13201473
 ] 

Gregory Chanan commented on HBASE-5317:
---

@Ted:
I agree we shouldn't just catch the TableExistsException -- that's why I didn't 
initially post that part of the patch.  My recollection of that issue was that 
the MiniMRCluster is creating a target directory in hbase.rootdir [I could be 
wrong about the exact location].  When we call table.getTableDescriptor() it 
can't get a table descriptor for target, so throws a TableExistsException.  
Can the handleDeprecation call prevent the target directory from being created? 
 Or are you thinking of something else?

It also seems a little strange to me that calling table.getTableDescriptor(); 
tries to get table descriptors for *everything* in hbase.rootdir.  Why should I 
get an exception thrown if hbase can't find a TableDescriptor for target when 
I am only asking about table?

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5340) HFile/LoadIncrementalHFiles should specify file name when it fails to load a file.

2012-02-06 Thread Jonathan Hsieh (Created) (JIRA)
HFile/LoadIncrementalHFiles should specify file name when it fails to load a 
file.
--

 Key: HBASE-5340
 URL: https://issues.apache.org/jira/browse/HBASE-5340
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.90.5
Reporter: Jonathan Hsieh


I was attempting to do a bulk load and got this error message.  Unfortunately 
it didn't tell me which file had the problem.

{code}
Exception in thread main java.io.IOException: Trailer 'header' is wrong; does 
the trailer size match content?
at 
org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527)
at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885)
at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryLoad(LoadIncrementalHFiles.java:204)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:173)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:452)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:457)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201488#comment-13201488
 ] 

Zhihong Yu commented on HBASE-5317:
---

handleDeprecation() call may not prevent the target directory from being 
created.
We should try to find out why 'target' directory was only created for hadoop 
0.23.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201491#comment-13201491
 ] 

Zhihong Yu commented on HBASE-5317:
---

Not every directory under hbase.rootdir is deemed corresponding to a table.
See the following javadoc in FSUtils.getTableDirs():
{code}
   * @return All the table directories under coderootdir/code. Ignore non 
table hbase folders such as
   * .logs, .oldlogs, .corrupt, .META., and -ROOT- folders.
{code}

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201515#comment-13201515
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/
---

(Updated 2012-02-06 19:51:58.341235)


Review request for hbase.


Changes
---

Addressed a few comments.
In addition the client can retry a MultiRowMutation if the first row is not 
available in the region (as that might indicate that the region moved).


Summary
---

This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

Multiple Rows can be locked and applied atomically as long as the application 
ensures that all rows reside in the same Region (by presplitting or a custom 
RegionSplitPolicy).
At SFDC we can use this to colocate subsets of a tenant's data and allow atomic 
operations over these subsets.

Obviously this is an advanced features and this prominently called out in the 
Javadoc.


This addresses bug HBASE-5229.
https://issues.apache.org/jira/browse/HBASE-5229


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1241120 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1241120 

Diff: https://reviews.apache.org/r/3748/diff


Testing
---

Tests added to TestFromClientSide and TestAtomicOperation


Thanks,

Lars



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed (at length) on the dev mailing list with the HBASE-3584 and 
 HBASE-5203 committed, supporting atomic cross row transactions within a 
 region becomes simple.
 I am aware of the hesitation about the usefulness of this feature, but we 
 have to start somewhere.
 Let's use this jira for discussion, I'll attach a patch (with tests) 
 momentarily to make this concrete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201529#comment-13201529
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  I haven't thought about it quite enough, but is there any way to do this 
without leaking the HFileSystem out to the rest of the code? As Ted pointed 
out, there are some somewhat public interfaces that will probably get touched 
by that, and the number of places it has required changes in unrelated test 
cases seems like a code smell to me.

  Maybe this could be a static cache somewhere, that given a FileSystem 
instance, it maintains the un-checksumed equivalents thereof as weak 
references? Then the concept would be self-contained within the HFile code, 
which up til now has been a fairly standalone file format.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201527#comment-13201527
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  I haven't thought about it quite enough, but is there any way to do this 
without leaking the HFileSystem out to the rest of the code? As Ted pointed 
out, there are some somewhat public interfaces that will probably get touched 
by that, and the number of places it has required changes in unrelated test 
cases seems like a code smell to me.

  Maybe this could be a static cache somewhere, that given a FileSystem 
instance, it maintains the un-checksumed equivalents thereof as weak 
references? Then the concept would be self-contained within the HFile code, 
which up til now has been a fairly standalone file format.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5336) Spurious exceptions in HConnectionImplementation

2012-02-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201536#comment-13201536
 ] 

Lars Hofhansl commented on HBASE-5336:
--

Interestingly I find no matching logs on the RegionServers or Datanodes.
I feel like I have seen a jira about this before, but I cannot find it.

 Spurious exceptions in HConnectionImplementation
 

 Key: HBASE-5336
 URL: https://issues.apache.org/jira/browse/HBASE-5336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 I have seen this on the client a few time during heave write testing:
 java.util.concurrent.ExecutionException: java.io.IOException: 
 java.io.IOException: java.lang.NullPointerException
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1376)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:891)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:743)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:730)
   at NewsFeedCreate.insert(NewsFeedCreate.java:91)
   at NewsFeedCreate$1.run(NewsFeedCreate.java:38)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.NullPointerException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:212)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1348)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   ... 1 more
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:243)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1289)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1386)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2161)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1954)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3363)
   at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1353)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1351)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
   ... 7 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA 

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201556#comment-13201556
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4844
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10643

What if rm contains more than one Mutation ?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10642

else is not needed considering exception is thrown on line 4170.


- Ted


On 2012-02-06 19:51:58, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-06 19:51:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1241120 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed (at length) on the dev mailing list with the HBASE-3584 and 
 

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201563#comment-13201563
 ] 

Zhihong Yu commented on HBASE-5229:
---

For my first comment, RowMutation maintains single row in internalAdd().
So it should be fine passing the row directly to internalMutate().

 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed (at length) on the dev mailing list with the HBASE-3584 and 
 HBASE-5203 committed, supporting atomic cross row transactions within a 
 region becomes simple.
 I am aware of the hesitation about the usefulness of this feature, but we 
 have to start somewhere.
 Let's use this jira for discussion, I'll attach a patch (with tests) 
 momentarily to make this concrete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201569#comment-13201569
 ] 

Phabricator commented on HBASE-5074:


mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  @dhruba; thanks for the fixes! Here are some more comments (I still have to 
go through the last 25% of the new version of the patch).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 
Please address this comment. The javadoc says major and the variable name 
says minor.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 
Please correct the misspelling.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I 
think this function needs to be renamed to expectAtLeastMajorVersion for clarity
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we 
should either consistently use the onDiskSizeWithHeader field or get rid of it.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please 
do use a constant instead of 0 here for the minor version.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy 
initialization is not thread-safe. This also applies to other enum members 
below. Can the meth field be initialized on the enum constructor, or do we rely 
on some classes being loaded by the time this initialization is invoked?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid 
repeating org.apache.hadoop.util.PureJavaCrc32 three times in string form
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid 
repeating the java.util.zip.CRC32 string
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid 
repeating the string
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 
Inconsistent formatting: 1024   +980.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201570#comment-13201570
 ] 

Phabricator commented on HBASE-5074:


mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  @dhruba; thanks for the fixes! Here are some more comments (I still have to 
go through the last 25% of the new version of the patch).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 
Please address this comment. The javadoc says major and the variable name 
says minor.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 
Please correct the misspelling.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I 
think this function needs to be renamed to expectAtLeastMajorVersion for clarity
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we 
should either consistently use the onDiskSizeWithHeader field or get rid of it.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please 
do use a constant instead of 0 here for the minor version.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy 
initialization is not thread-safe. This also applies to other enum members 
below. Can the meth field be initialized on the enum constructor, or do we rely 
on some classes being loaded by the time this initialization is invoked?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid 
repeating org.apache.hadoop.util.PureJavaCrc32 three times in string form
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid 
repeating the java.util.zip.CRC32 string
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid 
repeating the string
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 
Inconsistent formatting: 1024   +980.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201614#comment-13201614
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Todd: I agree with you. It is messy that the HFileSystem interface is leaking 
out to the unit tests. Instead, inside HFile, I can do something like this when 
a Reader is created:

  if (!fs instanceof HFileSystem) {
fs = new HFileSystem(fs);
  }

  what this means is that users of HFile that already passes in a HFileSystem 
will get the new behaviour while. HReginServer anyways voluntarily creates 
HFileSystem before invoking HFile, so it work.

  I did not do this earlier because I thought that 'using reflection' is 
costly, but on second thoughts the cost is not much because it will be done 
only once when a new reader is created for the first time. what do you think?


REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201615#comment-13201615
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Todd: I agree with you. It is messy that the HFileSystem interface is leaking 
out to the unit tests. Instead, inside HFile, I can do something like this when 
a Reader is created:

  if (!fs instanceof HFileSystem) {
fs = new HFileSystem(fs);
  }

  what this means is that users of HFile that already passes in a HFileSystem 
will get the new behaviour while. HReginServer anyways voluntarily creates 
HFileSystem before invoking HFile, so it work.

  I did not do this earlier because I thought that 'using reflection' is 
costly, but on second thoughts the cost is not much because it will be done 
only once when a new reader is created for the first time. what do you think?


REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201619#comment-13201619
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Yea, I think the instanceof check and confining HFileSystem to be only within 
the hfile package is much better.

  I don't think it should be costly -- as you said, it's only when the reader 
is created, which isn't on the hot code path, and instanceof checks are 
actually quite fast. They turn into a simple compare of the instance's klassid 
header against a constant, if I remember correctly.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201618#comment-13201618
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Yea, I think the instanceof check and confining HFileSystem to be only within 
the hfile package is much better.

  I don't think it should be costly -- as you said, it's only when the reader 
is created, which isn't on the hot code path, and instanceof checks are 
actually quite fast. They turn into a simple compare of the instance's klassid 
header against a constant, if I remember correctly.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201624#comment-13201624
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--



bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4152
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152
bq.  
bq.   What if rm contains more than one Mutation ?

Hopefully rm does contain more than one Mutation, otherwise using this API is 
pointless. :)
It is guranteed, though, that all Mutations are for this single row.

Do you see a concern?


bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4171
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171
bq.  
bq.   else is not needed considering exception is thrown on line 4170.

Right. But this makes the flow clear. Personally I am not a big fan of having 
to look through code and having to piece together the control flow by tracking 
exceptions and return statements.
I don't mind changing it, though.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4844
---


On 2012-02-06 19:51:58, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-06 19:51:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1241120 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201679#comment-13201679
 ] 

Phabricator commented on HBASE-5074:


mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Some more comments. I am still concerned about the copy-paste stuff in 
backwards-compatibility checking. Is there a way to minimize that?

  I also mentioned this in the comments below, but it would probably make sense 
to add more canned files in the no-checksum format generated by the old 
writer and read them with the new reader, the same way HFile v1 compatibility 
is ensured. I don't mind keeping the old writer code around in the unit test, 
but I think it is best to remove as much code from that legacy writer as 
possible (e.g. versatile API, toString, etc.) and only leave the parts 
necessary to generate the file for testing.

INLINE COMMENTS
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164
 Long line
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83
 Can this be made private if it is not accessed outside of this class?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78
 Use ALL_CAPS for constants
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76
 There seems to be a lot of copy-and-paste from the old HFileBlock code here. 
Is there a way to reduce that?

  I think we also need to create some canned old-format HFiles (using the old 
code) and read them with the new reader code as part of the test.
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365
 Make this class final.

  Also, it would make sense to strip this class down as much as possible to 
maintain the bare minimum of code required to test compatibility (if you have 
not done that already).
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800
 Do we ever use this function?
  
src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188
 Is 0 the minor version with no checksums? If so, please replace it with a 
constant for readability.
  
src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 
Is 0 the minor version with no checksums? If so, please replace it with a 
constant for readability.
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300
 Is 0 the minor version with no checksums? If so, please replace it with a 
constant for readability.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201678#comment-13201678
 ] 

Phabricator commented on HBASE-5074:


mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Some more comments. I am still concerned about the copy-paste stuff in 
backwards-compatibility checking. Is there a way to minimize that?

  I also mentioned this in the comments below, but it would probably make sense 
to add more canned files in the no-checksum format generated by the old 
writer and read them with the new reader, the same way HFile v1 compatibility 
is ensured. I don't mind keeping the old writer code around in the unit test, 
but I think it is best to remove as much code from that legacy writer as 
possible (e.g. versatile API, toString, etc.) and only leave the parts 
necessary to generate the file for testing.

INLINE COMMENTS
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164
 Long line
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83
 Can this be made private if it is not accessed outside of this class?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78
 Use ALL_CAPS for constants
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76
 There seems to be a lot of copy-and-paste from the old HFileBlock code here. 
Is there a way to reduce that?

  I think we also need to create some canned old-format HFiles (using the old 
code) and read them with the new reader code as part of the test.
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365
 Make this class final.

  Also, it would make sense to strip this class down as much as possible to 
maintain the bare minimum of code required to test compatibility (if you have 
not done that already).
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800
 Do we ever use this function?
  
src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188
 Is 0 the minor version with no checksums? If so, please replace it with a 
constant for readability.
  
src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356 
Is 0 the minor version with no checksums? If so, please replace it with a 
constant for readability.
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300
 Is 0 the minor version with no checksums? If so, please replace it with a 
constant for readability.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Created) (JIRA)
HBase build artifact should include security code by defult 


 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar


Hbase 0.92.0 was released with two artifacts, plain and security. The security 
code is built with -Psecurity. There are two tarballs, but only the plain jar 
in maven repo at repository.a.o. 

I see no reason to do a separate artifact for the security related code, since 
0.92 already depends on secure Hadoop 1.0.0, and all of the security related 
code is not loaded by default. In this issue, I propose, we merge the code 
under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5341:
-

Component/s: security
 build

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201720#comment-13201720
 ] 

Enis Soztutar commented on HBASE-5341:
--

I can provide a patch, if we agree on this. 

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5342) Grant/Revoke global permissions

2012-02-06 Thread Enis Soztutar (Created) (JIRA)
Grant/Revoke global permissions
---

 Key: HBASE-5342
 URL: https://issues.apache.org/jira/browse/HBASE-5342
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar


HBASE-3025 introduced simple ACLs based on coprocessors. It defines 
global/table/cf/cq level permissions. However, there is no way to grant/revoke 
global level permissions, other than the hbase.superuser conf setting. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201818#comment-13201818
 ] 

Enis Soztutar commented on HBASE-5341:
--

Also, there is no secure artifact in the maven repo, so depending on when 
0.92.1 is cut out, we might want to push 0.92.0-security as well. 

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5343) Access control API in HBaseAdmin.java

2012-02-06 Thread Enis Soztutar (Created) (JIRA)
Access control API in HBaseAdmin.java  
---

 Key: HBASE-5343
 URL: https://issues.apache.org/jira/browse/HBASE-5343
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar


To use the access control mechanism added in HBASE-3025, users should either 
use the shell interface, or use the coprocessor API directly, which is not very 
user friendly. We can add grant/revoke/user_permission commands similar to the 
shell interface to HBaseAdmin assuming HBASE-5341 is in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Gregory Chanan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201841#comment-13201841
 ] 

Gregory Chanan commented on HBASE-5317:
---

@Ted:
FSUtils.getTableDirs() excludes a specific list of directories.  Specifically:
{code}
 Arrays.asList(new String[]{ HREGION_LOGDIR_NAME, HREGION_OLDLOGDIR_NAME,
  CORRUPT_DIR_NAME, Bytes.toString(META_TABLE_NAME),
  Bytes.toString(ROOT_TABLE_NAME), SPLIT_LOGDIR_NAME }));
{code}

So if the MiniMRCluster creates a target directory, it will be returned.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201848#comment-13201848
 ] 

Zhihong Yu commented on HBASE-5317:
---

Right.
Can we find out why this target directory was created ?

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3134) [replication] Add the ability to enable/disable streams

2012-02-06 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201862#comment-13201862
 ] 

Jean-Daniel Cryans commented on HBASE-3134:
---

We can't hit ZK every time we replicate in order to see what the state is, each 
RS instead should have a watcher and the check should be done locally. The rest 
looks good, thanks a lot for working on this.

 [replication] Add the ability to enable/disable streams
 ---

 Key: HBASE-3134
 URL: https://issues.apache.org/jira/browse/HBASE-3134
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Teruyoshi Zenmyo
Priority: Minor
  Labels: replication
 Fix For: 0.94.0

 Attachments: HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch


 This jira was initially in the scope of HBASE-2201, but was pushed out since 
 it has low value compared to the required effort (and when want to ship 
 0.90.0 rather soonish).
 We need to design a way to enable/disable replication streams in a 
 determinate fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201867#comment-13201867
 ] 

Zhihong Yu commented on HBASE-5341:
---

If we remove the remove the maven 'security' profile, only secure HBase 
artifacts would be built, right ?
Since most users wouldn't be using secure HBase features, I think this might 
introduce confusion for them.

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5333) Introduce Memstore backpressure for writes

2012-02-06 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201880#comment-13201880
 ] 

Jean-Daniel Cryans commented on HBASE-5333:
---

I've done some brainstorming with Stack and the result was HBASE-5162.

 Introduce Memstore backpressure for writes
 

 Key: HBASE-5333
 URL: https://issues.apache.org/jira/browse/HBASE-5333
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl

 Currently if the memstore/flush/compaction cannot keep up with the writeload, 
 we block writers up to hbase.hstore.blockingWaitTime milliseconds (default is 
 9).
 Would be nice if there was a concept of a soft backpressure that slows 
 writing clients gracefully *before* we reach this condition.
 From the log:
 2012-02-04 00:00:06,963 WARN 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
 table,,1328313512779.c2761757621ddf8fb78baf5288d71271. has too many store 
 files; delaying flush up to 9ms

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201881#comment-13201881
 ] 

Enis Soztutar commented on HBASE-5341:
--

The only artifact build will be plain 0.92.1 or 0.94 (no -security appended). 
But this will include security related code. It's like Hadoop-1.0.0 which 
includes security related codes in one artifact. 

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-02-06 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201884#comment-13201884
 ] 

Jean-Daniel Cryans commented on HBASE-5267:
---

I'm +1 with the patch, but eventually I'd still like to see something in the 
book about it.

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201885#comment-13201885
 ] 

Zhihong Yu commented on HBASE-5341:
---

The certification for 0.92.0 was for insecure HBase artifact.
If we only produce one secure artifact, would certification process change ?

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201898#comment-13201898
 ] 

Enis Soztutar commented on HBASE-5341:
--

Sorry, I did not understand what you are referring to with the certification 
process. Do you mean voting for the RC, signing the release, etc? 

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201901#comment-13201901
 ] 

Zhihong Yu commented on HBASE-5267:
---

I was about to create a sub-task but found that '3.2.10. Experimental off-heap 
cache' points to:
http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/

Would an update of that blog suffice ?

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201907#comment-13201907
 ] 

Zhihong Yu commented on HBASE-5341:
---

Yes.

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-02-06 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201909#comment-13201909
 ] 

Jean-Daniel Cryans commented on HBASE-5267:
---

Pointing to it might be a good option, along with a line or two on how to use 
it.

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-02-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5267:
--

Attachment: 5267v4.txt

Patch v4 adds the new config parameter to src/docbkx/upgrading.xml

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt, 5267v4.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201917#comment-13201917
 ] 

Enis Soztutar commented on HBASE-5341:
--

I don't see a reason for changing the release process. The vote for 0.92.0 
release included both the plain and secure artifacts, see 
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.devel/25671

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201928#comment-13201928
 ] 

Zhihong Yu commented on HBASE-5341:
---

If you search for the voting process by entering the following in 
http://search-hadoop.com:
'ANN: The fifth hbase 0.92.0 release candidate is available for download'
one can hardly tell whether the voters tested with secure hbase tar ball.

If we produce one artifact (I think we should), some voters have to test 
security features before we declare new release.

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-02-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201964#comment-13201964
 ] 

Hadoop QA commented on HBASE-5267:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12513543/5267v4.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 156 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/909//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/909//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/909//console

This message is automatically generated.

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt, 5267v4.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201983#comment-13201983
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl 
elaborate more on this comment?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76
 I think it is better to keep the compatibility code separate from existing 
live-test code. That way, it is guaranteed to never change.

  is there any other existing unit test that keeps a version1 file to run unit 
tests against?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365
 I did not strip it down, just so that it remains as it was earlier. This is 
for backward-compatibility, so isn't it better to keep as it was?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800
 Was useful while testing, but I will get rid of it.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201988#comment-13201988
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 can you pl 
elaborate more on this comment?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76
 I think it is better to keep the compatibility code separate from existing 
live-test code. That way, it is guaranteed to never change.

  is there any other existing unit test that keeps a version1 file to run unit 
tests against?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365
 I did not strip it down, just so that it remains as it was earlier. This is 
for backward-compatibility, so isn't it better to keep as it was?
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800
 Was useful while testing, but I will get rid of it.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201990#comment-13201990
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 But CRC32C is 
not installed by default.  You would need hadoop 2.0 (not yet released) to get 
that.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201997#comment-13201997
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4852
---


I tried to use the command line tool to compress an HLog written by 0.92 and 
got the follwoing:

Exception in thread main java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192)
at 
org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104)
at 
org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64)

Also, if you use the command line tool with no arguments, it should print its 
help (right now it prints an IndexOutOfBOundsException).

I'll try again with an hlog written by trunk - I'm guessing the hlog 
serialization version might have changed or something.

- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Enis Soztutar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202000#comment-13202000
 ] 

Enis Soztutar commented on HBASE-5341:
--

Agreed. Conceptually, security related features are not very different that 
other features. They can be included in the code base, disabled by default, and 
marked as experimental if not tested well. 

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202024#comment-13202024
 ] 

Phabricator commented on HBASE-5074:


tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see 
PureJavaCrc32 in hadoop 1.0 either.
  I think it would be nice to default to the best checksum class.
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would 
hbase.hstore.checksum.algo be a better name for this config parameter ?

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202025#comment-13202025
 ] 

Phabricator commented on HBASE-5074:


tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 I don't see 
PureJavaCrc32 in hadoop 1.0 either.
  I think it would be nice to default to the best checksum class.
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 Would 
hbase.hstore.checksum.algo be a better name for this config parameter ?

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202026#comment-13202026
 ] 

Phabricator commented on HBASE-5292:


mbautin has commented on the revision [jira] [HBASE-5292] [89-fb] Prevent 
counting getSize on compactions.

  @zhiqiu: does this problem exist in the open-source HBase trunk? If so, could 
you please port this patch to trunk? If this is not applicable to trunk, could 
you please set the JIRA status to resolved? Thanks!

REVISION DETAIL
  https://reviews.facebook.net/D1527


 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: D1527.1.patch, D1527.2.patch, D1527.3.patch, 
 D1527.4.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5344) [89-fb] Scan unassigned region directory on master failover

2012-02-06 Thread Mikhail Bautin (Created) (JIRA)
[89-fb] Scan unassigned region directory on master failover
---

 Key: HBASE-5344
 URL: https://issues.apache.org/jira/browse/HBASE-5344
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


In case the master dies after a regionserver writes region state as OPENED or 
CLOSED in ZK but before the update is received by master and written to meta, 
the new master that comes up has to pick up the region state from ZK and write 
it to meta. Otherwise we can get multiply-assigned regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5344) [89-fb] Scan unassigned region directory on master failover

2012-02-06 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5344:
---

Attachment: D1605.1.patch

mbautin requested code review of [jira] [HBASE-5344] [89-fb] Scan unassigned 
region directory on master failover.
Reviewers: Kannan, Karthik, Liyin, JIRA, stack

  In case the master dies after a regionserver writes region state as OPENED or 
CLOSED in ZK but before the update is received by master and written to meta, 
the new master that comes up has to pick up the region state from ZK and write 
it to meta. Otherwise we can get multiply-assigned regions.

  The current solution tries to reassign the root region if it is unassigned 
but does not implement a work-around if META regions are missing. Also, it 
currently heavily relies on direct scanning of regionservers (reading 
regionserver list from ZK and doing an RPC on each regionserver to get the list 
of online regions). We were already doing that in master failover, but I am 
making it parallel here.

TEST PLAN
  Unit tests, dev cluster, dark launch with killing regionservers and master

REVISION DETAIL
  https://reviews.facebook.net/D1605

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionEventData.java
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
  src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java
  src/main/java/org/apache/hadoop/hbase/master/DirectRegionServerScanner.java
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  src/main/java/org/apache/hadoop/hbase/master/ProcessRegionOpen.java
  src/main/java/org/apache/hadoop/hbase/master/RegionManager.java
  src/main/java/org/apache/hadoop/hbase/master/RootScanner.java
  src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
  src/main/java/org/apache/hadoop/hbase/master/ZKUnassignedWatcher.java
  
src/main/java/org/apache/hadoop/hbase/master/handler/MasterOpenRegionHandler.java
  
src/test/java/org/apache/hadoop/hbase/master/TestRegionStateOnMasterFailure.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3429/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 [89-fb] Scan unassigned region directory on master failover
 ---

 Key: HBASE-5344
 URL: https://issues.apache.org/jira/browse/HBASE-5344
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1605.1.patch


 In case the master dies after a regionserver writes region state as OPENED or 
 CLOSED in ZK but before the update is received by master and written to meta, 
 the new master that comes up has to pick up the region state from ZK and 
 write it to meta. Otherwise we can get multiply-assigned regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202032#comment-13202032
 ] 

jirapos...@reviews.apache.org commented on HBASE-4608:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4853
---


I tried the compression tool on a log created by YCSB in load mode with the 
standard dataset. Since the values are fairly large here (100 bytes) it didn't 
get a huge compression ratio - from about 64MB down to 52MB (~20%). But still 
not bad. I looked at the resulting data using xxd and it looks like there's 
still a number of places where we could use variable length integers instead of 
non-variable length. I wrote a quick C program to count the number of 0x00 
bytes in the log and found about 3MB worth (~5%). Since the actual table data 
is all human-readable text in this case, all of the 0x00s should be able to be 
compressed away, I think.

I also tested on a YCSB workload where each row has 1000 columns of 4 bytes 
each (similar to an indexing workload) and the compression ratio was 60% (64M 
down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.


src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/2740/#comment10650

invert the order of these || clauses - otherwise you get an out-of-bounds 
just running the tool with no arguments



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/2740/#comment10651

I think the better way of expressing this usage would be:

WALCompressor [-u | -c] input output

  -u - uncompresses the input log
  -c - compresses the output log

Exactly one of -u or -c must be specified





src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
https://reviews.apache.org/r/2740/#comment10649

this code doesn't work properly. Here's what you want to do:

  Configuration conf = new Configuration();
  FileSystem fs = path.getFileSystem(conf);



- Todd


On 2012-01-24 22:29:18, Li Pi wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  ---
bq.  
bq.  (Updated 2012-01-24 22:29:18)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HLog compression. Has unit tests and a command line tool for 
compressing/decompressing.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.  https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 
f067221 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
 d9cd6de 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
 cbef70f 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java 
e1117ef 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java 
PRE-CREATION 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
59910bf 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.



 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Attachments: 4608v1.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 
 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current 

[jira] [Commented] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202050#comment-13202050
 ] 

Phabricator commented on HBASE-5292:


zhiqiu has commented on the revision [jira] [HBASE-5292] [89-fb] Prevent 
counting getSize on compactions.

  @mbautin Sure. I'll port it to open-source trunk right now. Thank you so much 
for reminding me this. :D

REVISION DETAIL
  https://reviews.facebook.net/D1527


 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: D1527.1.patch, D1527.2.patch, D1527.3.patch, 
 D1527.4.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5301) Some RegionServer metrics have really confusing names

2012-02-06 Thread Otis Gospodnetic (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-5301:


Component/s: metrics
Description: 
Mikael Sitruk commented on this back in Nov 2011 and after looking at this I 
completely agree with him.  For example, flushSize_avg_time makes no sense.  
flushSize is in bytes, so is this the average flush size?  Or the average 
time per flush?  In which case, why not call the measure flush_avg_time.  But 
to add to the confusion there is already a flushTime_avg_time metric.  There 
is also flushTime_num_ops and flushSize_num_ops that are confusing.  Is the 
former the number of flushes?  In which case, why have time in the metric 
name?  


On 11/22/11 5:23 PM, Mikael Sitruk mikael.sit...@gmail.com wrote:

Hi

I have enabled metrics on Hbase cluster (0.90.1), and mapped the metrics
to
3 categories (missing, Present but not documented/Incomplete
documentation,Ok) according to their status in the book (
http://hbase.apache.org/book.html#hbase_metrics). Is it possible to udpate
the book accordingly?
It seems also that rpc metrics are not documented at all.

And now some questions on the metrics:
I can see some metrics present a num_ops and avg_time suffix (like rpc)
but
it seems that for certain metrics is it totally unclear (to me at least)
or
their name is missleading - for example what
means compactionTime_avg_time/compactionTime_num_ops? or
flushSize_avg_time
and flushSize_num_ops? I mean I would have understood compaction_avg_time
and flushSize or flush_avg_time.


  was:

Mikael Sitruk commented on this back in Nov 2011 and after looking at this I 
completely agree with him.  For example, flushSize_avg_time makes no sense.  
flushSize is in bytes, so is this the average flush size?  Or the average 
time per flush?  In which case, why not call the measure flush_avg_time.  But 
to add to the confusion there is already a flushTime_avg_time metric.  There 
is also flushTime_num_ops and flushSize_num_ops that are confusing.  Is the 
former the number of flushes?  In which case, why have time in the metric 
name?  


On 11/22/11 5:23 PM, Mikael Sitruk mikael.sit...@gmail.com wrote:

Hi

I have enabled metrics on Hbase cluster (0.90.1), and mapped the metrics
to
3 categories (missing, Present but not documented/Incomplete
documentation,Ok) according to their status in the book (
http://hbase.apache.org/book.html#hbase_metrics). Is it possible to udpate
the book accordingly?
It seems also that rpc metrics are not documented at all.

And now some questions on the metrics:
I can see some metrics present a num_ops and avg_time suffix (like rpc)
but
it seems that for certain metrics is it totally unclear (to me at least)
or
their name is missleading - for example what
means compactionTime_avg_time/compactionTime_num_ops? or
flushSize_avg_time
and flushSize_num_ops? I mean I would have understood compaction_avg_time
and flushSize or flush_avg_time.


 Labels: metrics  (was: )

 Some RegionServer metrics have really confusing names
 -

 Key: HBASE-5301
 URL: https://issues.apache.org/jira/browse/HBASE-5301
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: Doug Meil
  Labels: metrics

 Mikael Sitruk commented on this back in Nov 2011 and after looking at this I 
 completely agree with him.  For example, flushSize_avg_time makes no sense. 
  flushSize is in bytes, so is this the average flush size?  Or the average 
 time per flush?  In which case, why not call the measure flush_avg_time.  
 But to add to the confusion there is already a flushTime_avg_time metric.  
 There is also flushTime_num_ops and flushSize_num_ops that are confusing. 
  Is the former the number of flushes?  In which case, why have time in the 
 metric name?  
 On 11/22/11 5:23 PM, Mikael Sitruk mikael.sit...@gmail.com wrote:
 Hi
 I have enabled metrics on Hbase cluster (0.90.1), and mapped the metrics
 to
 3 categories (missing, Present but not documented/Incomplete
 documentation,Ok) according to their status in the book (
 http://hbase.apache.org/book.html#hbase_metrics). Is it possible to udpate
 the book accordingly?
 It seems also that rpc metrics are not documented at all.
 And now some questions on the metrics:
 I can see some metrics present a num_ops and avg_time suffix (like rpc)
 but
 it seems that for certain metrics is it totally unclear (to me at least)
 or
 their name is missleading - for example what
 means compactionTime_avg_time/compactionTime_num_ops? or
 flushSize_avg_time
 and flushSize_num_ops? I mean I would have understood compaction_avg_time
 and flushSize or flush_avg_time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-02-06 Thread Anoop Sam John (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202057#comment-13202057
 ] 

Anoop Sam John commented on HBASE-2038:
---

Hi Alex,
Thanks for your reply...  Yes I had seen your past comment..I am checking 
the trunk code for the co processor for this work as of now...

What is your comment on my first comment, that the HRegionServer next(final 
long scannerId, int nbRows) calls the co processor preScannerNext() by passing 
the RegionScanner. On this we can not make a seek()..

Thanks
Anoop


 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5321) this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90.

2012-02-06 Thread ramkrishna.s.vasudevan (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-5321.
---

Resolution: Fixed

Committed to 0.90.

 this.allRegionServersOffline  not set to false after one RS comes online and 
 assignment is done in 0.90.
 

 Key: HBASE-5321
 URL: https://issues.apache.org/jira/browse/HBASE-5321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5321.patch


 In HBASE-5160 we do not wait for TM to assign the regions after the first RS 
 comes online.
 After doing this the variable this.allRegionServersOffline needs to be reset 
 which is not done in 0.90.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

2012-02-06 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5323:
--

Attachment: HBASE-5323.patch

Patch for 0.90

 Need to handle assertion error while splitting log through 
 ServerShutDownHandler by shutting down the master
 

 Key: HBASE-5323
 URL: https://issues.apache.org/jira/browse/HBASE-5323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7

 Attachments: HBASE-5323.patch, HBASE-5323.patch


 We know that while parsing the HLog we expect the proper length from HDFS.
 In WALReaderFSDataInputStream
 {code}
   assert(realLength = this.length);
 {code}
 We are trying to come out if the above condition is not satisfied.  But if 
 SSH.splitLog() gets this problem then it lands in the run method of 
 EventHandler.  This kills the SSH thread and so further assignment does not 
 happen.  If ROOT and META are to be assigned they cannot be.
 I think in this condition we abort the master by catching such exceptions.
 Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5343) Access control API in HBaseAdmin.java

2012-02-06 Thread Gary Helmling (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202081#comment-13202081
 ] 

Gary Helmling commented on HBASE-5343:
--

Adding coprocessor specific methods to {{HBaseAdmin}} completely undermines the 
purpose of coprocessors as optionally enabled extensions, and fails to scale as 
features are added. Having {{HBaseAdmin}} be a jumble of methods related to 
specific coprocessors is not very user friendly either.

Security usage requires that {{SecureRpcEngine}} be loaded and that 
{{AccessController}} be enabled. Yes, configuring these components is more 
complicated than it needs to be right now. But providing interfaces to these 
two optional components as a permanent part of the client-facing API presented 
by {{HBaseAdmin}} is not the solution.

If {{AccessControllerProtocol}} is too difficult to work with, then I think we 
would be better off with a simple client helper, like a {{SecurityClient}} 
class similar to the {{Constraints}} helper that was implemented for the 
constraints coprocessor.

 Access control API in HBaseAdmin.java  
 ---

 Key: HBASE-5343
 URL: https://issues.apache.org/jira/browse/HBASE-5343
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 To use the access control mechanism added in HBASE-3025, users should either 
 use the shell interface, or use the coprocessor API directly, which is not 
 very user friendly. We can add grant/revoke/user_permission commands similar 
 to the shell interface to HBaseAdmin assuming HBASE-5341 is in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-02-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202088#comment-13202088
 ] 

Lars Hofhansl commented on HBASE-2038:
--

Unfortunately there is no seeking in the coprocessors, yet. They work more like 
a filter of a real scan. Seeking is done one level (or two actually) level 
deeper.
Seeking is done in the StoreScanners, coprocessors see RegionScanners.

It is not entirely clear to me where to hook this up in that API.

It might be possible to provide a custom filter to do that. Filters operate at 
the storescanner level, and so can (and do) provide seek hints to the calling 
scanner.

 Coprocessors: Region level indexing
 ---

 Key: HBASE-2038
 URL: https://issues.apache.org/jira/browse/HBASE-2038
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

 HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a 
 good goalpost for coprocessor environment design -- there should be enough of 
 it so region level indexing can be reimplemented as a coprocessor without any 
 loss of functionality. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Gary Helmling (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202091#comment-13202091
 ] 

Gary Helmling commented on HBASE-5341:
--

This would break the ability to compile HBase 0.92+ against Hadoop releases 
without security.  Even though we currently compile against 1.0 by default, we 
haven't block the ability to compile against previous versions.  So this would 
be a change, especially if there's anyone out there running on builds of the 
0.20-append branch.

We've also been discussing moving in the direction of a modular build (see 
HBASE-4336).  The current security/ tree is practically a module already, just 
lacking it's own pom.xml.  Moving security/ back up into src/ would be a step 
back into the opposite direction, keeping us with a monolithic release.  This 
may make the packaging slightly simpler, but it still won't make it any easier 
to test out all the combinations of optional components.  The security profile 
does currently use it's own configuration for testing, so at least we get 
execution of the full test suite using {{SecureRpcEngine}}.  The full test 
suite is really overkill for testing.  We could use just a good set of RPC 
focused tests if we had them.  But in my opinion that kind of focused testing 
would be easier to handle in a security module than are part of a monolithic 
build.

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201624#comment-13201624
 ] 

Lars Hofhansl edited comment on HBASE-5229 at 2/7/12 5:42 AM:
--


bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4152
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152
bq.  
bq.   What if rm contains more than one Mutation ?

Hopefully rm does contain more than one Mutation, otherwise using this API is 
pointless. :)
It is guranteed, though, that all Mutations are for this single row.

Do you see a concern?


bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4171
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171
bq.  
bq.   else is not needed considering exception is thrown on line 4170.

Right. But this makes the flow clear. Personally I am not a big fan of having 
to look through code and having to piece together the control flow by tracking 
exceptions and return statements.
I don't mind changing it, though.


- Lars


  was (Author: jirapos...@reviews.apache.org):


bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4152
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152
bq.  
bq.   What if rm contains more than one Mutation ?

Hopefully rm does contain more than one Mutation, otherwise using this API is 
pointless. :)
It is guranteed, though, that all Mutations are for this single row.

Do you see a concern?


bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4171
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171
bq.  
bq.   else is not needed considering exception is thrown on line 4170.

Right. But this makes the flow clear. Personally I am not a big fan of having 
to look through code and having to piece together the control flow by tracking 
exceptions and return statements.
I don't mind changing it, though.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4844
---


On 2012-02-06 19:51:58, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-06 19:51:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1241120 
bq.

[jira] [Issue Comment Edited] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201556#comment-13201556
 ] 

Lars Hofhansl edited comment on HBASE-5229 at 2/7/12 5:42 AM:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4844
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10643

What if rm contains more than one Mutation ?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10642

else is not needed considering exception is thrown on line 4170.


- Ted


  was (Author: jirapos...@reviews.apache.org):

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4844
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10643

What if rm contains more than one Mutation ?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10642

else is not needed considering exception is thrown on line 4170.


- Ted


On 2012-02-06 19:51:58, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-06 19:51:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1241120 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.


  
 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for 

[jira] [Commented] (HBASE-5342) Grant/Revoke global permissions

2012-02-06 Thread Gary Helmling (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202093#comment-13202093
 ] 

Gary Helmling commented on HBASE-5342:
--

Some of the building blocks for this are already in place.  It shouldn't be too 
difficult to fill in the missing pieces.  Would be great to see this completed.

 Grant/Revoke global permissions
 ---

 Key: HBASE-5342
 URL: https://issues.apache.org/jira/browse/HBASE-5342
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 HBASE-3025 introduced simple ACLs based on coprocessors. It defines 
 global/table/cf/cq level permissions. However, there is no way to 
 grant/revoke global level permissions, other than the hbase.superuser conf 
 setting. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202097#comment-13202097
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 my choice would 
be to make java's crc32 be the default. PureJavacrc32 is compatible with java's 
crc32. However, purejavacrc32C is not compatible with either of these.

  Although PureJavaCRC32 is not part of 1.0, if and when you move to hadoop 
2.0, you will automatically get the better performant algorithm via 
Purejavacrc32.

  For the adventurous, one can manually pull in PureJavaCRC32C inot one's own 
hbase deployment by explicitly setting hbase.hstore.checksum.algorithm to be 
CRC32C.

  Does that sound reasonable?
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:257 sounds 
good, will make this change.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2012-02-06 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5292:
---

Attachment: D1617.1.patch

zhiqiu requested code review of [jira] [HBASE-5292] Prevent counting getSize 
on compactions.
Reviewers: Kannan, mbautin, Liyin, JIRA

  Added two separate metrics for both get() and next(). This is done by
  refactoring on internal next() API. To be more specific, only Get.get()
  and ResultScanner.next() passes the metric name (getsize and
  nextsize repectively) to
HRegion::RegionScanner::next(ListKeyValue, String)

  This will eventually hit StoreScanner()::next((ListKeyValue,
  int, String) where the metrics are counted.

  And their call paths are:

  1) Get

  HTable::get(final Get get)
  = HRegionServer::get(byte [] regionName, Get get)
  = HRegion::get(final Get get, final Integer lockid)
  = HRegion::get(final Get get)  [pass METRIC_GETSIZE to the
  callee]

  = HRegion::RegionScanner::next(ListKeyValue outResults, String
  metric)
  = HRegion::RegionScanner::next(ListKeyValue outResults, int limit,
  String metric)
  = HRegion::RegionScanner::nextInternal(int limit, String metric)
  = KeyValueHeap::next(ListKeyValue result, int limit, String
  metric)
  = StoreScanner::next(ListKeyValue outResult, int limit, String
  metric)

  2) Next

  HTable::ClientScanner::next()
  = ScannerCallable::call()
  = HRegionServer::next(long scannerId)
  = HRegionServer::next(final long scannerId, int nbRows)  [pass
  METRIC_NEXTSIZE to the callee]

  = HRegion::RegionScanner::next(ListKeyValue outResults, String
  metric)
  = HRegion::RegionScanner::next(ListKeyValue outResults, int limit,
  String metric)
  = HRegion::RegionScanner::nextInternal(int limit, String metric)
  = KeyValueHeap::next(ListKeyValue result, int limit, String
  metric)
  = StoreScanner::next(ListKeyValue outResult, int limit, String
  metric)

  Task ID: #898948

  Blame Rev:

TEST PLAN
  1. Passed unit tests.
  2. Created a testcase TestRegionServerMetrics::testGetNextSize to
  guarantee:
   * Get/Next contributes to getsize/nextsize metrics
   * Both getsize/nextsize are per Column Family
   * Flush/compaction won't affect these two metrics

  Revert Plan:

  Tags:

REVISION DETAIL
  https://reviews.facebook.net/D1617

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/regionserver/InternalScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3441/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: D1527.1.patch, D1527.2.patch, D1527.3.patch, 
 D1527.4.patch, D1617.1.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202100#comment-13202100
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Ted:I forgot to state that one can change the default checksum algorithm 
anytime. No disk format upgrade is necessary. Each hfile stores the checksum 
algorithm that is used to store data inside it. If today u use CRC32 and the 
tomorrow you change the configuration setting to CRC32C, then new files that 
are generated (as part of memstore flushes and compactions) will start using 
CRC32C while older files will continue to be verified via CRC32 algorithm.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202099#comment-13202099
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Ted:I forgot to state that one can change the default checksum algorithm 
anytime. No disk format upgrade is necessary. Each hfile stores the checksum 
algorithm that is used to store data inside it. If today u use CRC32 and the 
tomorrow you change the configuration setting to CRC32C, then new files that 
are generated (as part of memstore flushes and compactions) will start using 
CRC32C while older files will continue to be verified via CRC32 algorithm.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202103#comment-13202103
 ] 

stack commented on HBASE-5341:
--

bq. This would break the ability to compile HBase 0.92+ against Hadoop releases 
without security.

Perhaps we could entertain breaking this for 0.94.0?  i.e. saying we only run 
on hadoops w/ security? (CDH3 has it?  What doesn't that we want to run on by 
the time 0.94.0 is out).

On modularization, yes, if hbase-4336 is done soon, security is a natural.  
Otherwise, we should do as Enis suggests.




 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202105#comment-13202105
 ] 

Lars Hofhansl commented on HBASE-5229:
--

It's even simpler. A coprocessor endpoint has access to its region. If I rename 
internalMutate from my patch to mutateRowsWithLocks(ListMutations mutations, 
SetString rowsToLock) and make it public in HRegion, it can be called from a 
coprocessor endpoint.
It would not exposed in HRegionInterface, HRegionServer, RegionServerServices, 
or the HTableInterfaces.

 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed (at length) on the dev mailing list with the HBASE-3584 and 
 HBASE-5203 committed, supporting atomic cross row transactions within a 
 region becomes simple.
 I am aware of the hesitation about the usefulness of this feature, but we 
 have to start somewhere.
 Let's use this jira for discussion, I'll attach a patch (with tests) 
 momentarily to make this concrete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5325) Expose basic information about the master-status through jmx beans

2012-02-06 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202108#comment-13202108
 ] 

stack commented on HBASE-5325:
--

@Enis metrics2 seems a bit out there for us (hadoop 0.23?). We want to run on 
0.23 and 1.0 and 2.0, etc., so it'd be a while before we could lean on it.  
metrics2 has facility that would help? (I've not studied it).

@Hitesh Regards I am still digging into jmx internals but I could not find 
anything which mentions it as an option for pushing information., even if 
there was a means (IIRC there is but am likely off), I think we'd have master 
pulling.

bq. Having the master pull information from all region servers using jmx ( or 
any other point to point protocol ) would likely be a bad idea from a 
performance point of view.

Currently every regionserver sends status every (configurable) second.  Its a 
fat Writable serialization of each regionservers counters and current state.  
IIRC, this mechanism runs mostly independent and beside our metrics (so 
there'll be Writable serialization of regionstate and if something like tsdb is 
running, there'll be a JMX serialization of server stating happening too).  
Would be an improvement if we did metrics reporting one way only if possible.

bq. Also, was your intention to have the HMaster be a metric aggregator for the 
RegionServers' metrics?

It does this now for key stats.

bq. I still need to look at nesting of mbeans from various components and also 
need to look at the hbase code in more detail to see what kind of management 
options could be exposed via jmx.

I'd be interested in what you think.  We need to figure being able to config a 
running cluster; i.e. change Configuration values and have hbase notice.  
Having this go via jmx would likely be like taking the 'killarney road to 
dingle' as my grandma used to say (its shorter if you take the tralee road) so 
maybe jmx is read-only rather than 'management'.


 Expose basic information about the master-status through jmx beans 
 ---

 Key: HBASE-5325
 URL: https://issues.apache.org/jira/browse/HBASE-5325
 Project: HBase
  Issue Type: Improvement
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5325.1.patch, HBASE-5325.wip.patch


 Similar to the Namenode and Jobtracker, it would be good if the hbase master 
 could expose some information through mbeans.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202124#comment-13202124
 ] 

Jonathan Hsieh commented on HBASE-5341:
---

I'd be happy if the tarball(s) for 0.92.1 just come out with the ./security 
directory in the correct place in tarball.  I would think that would be the 
most expedient thing to do.  (this is however, likely easier said than done)

@Gary 
- does compiling against hdfs 1.0.0 and running this new hbase jar in 
non-secure mode against the append branch hdfs jar work?  If it works,  I'm not 
convinced compilation matters as much.  I do most of my system testing of these 
release from jars compiled against hadoop 1.0.0 but running on top of a cdh3 
hdfs version -- no problems.  If the hbase binary doesn't work, then I agree 
that this a concern -- it would block shops locked into their own hdfs branches 
that don't support hadoop security.
- using the module approach, the security stuff would be a separate jar that we 
could add to the classpath right?

@Ted
- There are plenty of pieces we've release that been haven't tested by many 
people.  That said, regardless of how included, I can commit that we'll be 
testing the access control and security features.

@Stack 
- Hadoop 1.0.0 and CDH3 both have security and append.  The next HDFS it seems 
most folks are coalescing around is 0.23, which also has security and append.

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4658) Put attributes are not exposed via the ThriftServer

2012-02-06 Thread dhruba borthakur (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4658:


   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Put attributes are not exposed via the ThriftServer
 ---

 Key: HBASE-4658
 URL: https://issues.apache.org/jira/browse/HBASE-4658
 Project: HBase
  Issue Type: Bug
  Components: thrift
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: D1563.1.patch, D1563.1.patch, D1563.1.patch, 
 D1563.2.patch, D1563.2.patch, D1563.2.patch, D1563.3.patch, D1563.3.patch, 
 D1563.3.patch, ThriftPutAttributes1.txt


 The Put api also takes in a bunch of arbitrary attributes that an application 
 can use to associate metadata with each put operation. This is not exposed 
 via Thrift.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression

2012-02-06 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202127#comment-13202127
 ] 

dhruba borthakur commented on HBASE-5313:
-

One option listed above is to keep all the keys in the beginning of the block 
and all the values in the end of the block. The keys will still be 
delta-encoded. The values can be lzo-compressed.

any other ideas out there?

 Restructure hfiles layout for better compression
 

 Key: HBASE-5313
 URL: https://issues.apache.org/jira/browse/HBASE-5313
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 A HFile block contain a stream of key-values. Can we can organize these kvs 
 on the disk in a better way so that we get much greater compression ratios?
 One option (thanks Prakash) is to store all the keys in the beginning of the 
 block (let's call this the key-section) and then store all their 
 corresponding values towards the end of the block. This will allow us to 
 not-even decompress the values when we are scanning and skipping over rows in 
 the block.
 Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5343) Access control API in HBaseAdmin.java

2012-02-06 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202133#comment-13202133
 ] 

Andrew Purtell commented on HBASE-5343:
---

bq.  We can add grant/revoke/user_permission commands similar to the shell 
interface to HBaseAdmin assuming HBASE-5341 is in.

-1

This issue should be resolved as 'invalid'.

 Access control API in HBaseAdmin.java  
 ---

 Key: HBASE-5343
 URL: https://issues.apache.org/jira/browse/HBASE-5343
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 To use the access control mechanism added in HBASE-3025, users should either 
 use the shell interface, or use the coprocessor API directly, which is not 
 very user friendly. We can add grant/revoke/user_permission commands similar 
 to the shell interface to HBaseAdmin assuming HBASE-5341 is in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5341) HBase build artifact should include security code by defult

2012-02-06 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13202135#comment-13202135
 ] 

Andrew Purtell commented on HBASE-5341:
---

If we drop support for 0.20.x then compilation issues mostly go away. It won't 
fail for lack of security APIs in Hadoop.

I lean with Gary's view that security belongs in a (Maven) module.

 HBase build artifact should include security code by defult 
 

 Key: HBASE-5341
 URL: https://issues.apache.org/jira/browse/HBASE-5341
 Project: HBase
  Issue Type: Improvement
  Components: build, security
Affects Versions: 0.94.0, 0.92.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 Hbase 0.92.0 was released with two artifacts, plain and security. The 
 security code is built with -Psecurity. There are two tarballs, but only the 
 plain jar in maven repo at repository.a.o. 
 I see no reason to do a separate artifact for the security related code, 
 since 0.92 already depends on secure Hadoop 1.0.0, and all of the security 
 related code is not loaded by default. In this issue, I propose, we merge the 
 code under /security to src/ and remove the maven profile. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira