date:20120206

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
---

Attachment: D1521.3.patch

dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase 
block cache.
Reviewers: mbautin

  Many new goodies, thanks to the feedback from Mikhail and Todd. This completes
  my addressing all the current set of review comments. If somebody can 
re-review it
  again, that will be great.

  1. The bytesPerChecksum is configurable. One can set 
hbase.hstore.bytes.per.checksum
  in the config to set this. The default value is 16K. Similarly, one can set
  hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If
  PureJavaCRC32 algoritm is available in the classpath, then it is used, 
otherwise it falls back to using java.util.zip.CRC32. Each checksum value is 
assumed to be 4 bytes,
  it is currently not configurable (any comments here?). The reflection-method 
of
  creating checksum objects is reworked to incur much lower overhead.

  2. If a hbase-level crc check fails, then it falls back to using hdfs-level
  checksums for the next few reads (defalts to 100). After that, it will retry
  using hbase-level checksums. I picked 100 as the default so that even in the 
case
  of continuous hbase-checksum failures, the overhead for additionals iops is 
limited
  to 1%. Enahnced unit test to validate this behaviour.

  3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, 
added
  JMX metrics to record the number of times hbase-checksum verification 
failures occur.

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5074:
---

Attachment: D1521.3.patch

dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase 
block cache.
Reviewers: mbautin

  Many new goodies, thanks to the feedback from Mikhail and Todd. This completes
  my addressing all the current set of review comments. If somebody can 
re-review it
  again, that will be great.

  1. The bytesPerChecksum is configurable. One can set 
hbase.hstore.bytes.per.checksum
  in the config to set this. The default value is 16K. Similarly, one can set
  hbase.hstore.checksum.name to either CRC32 or CRC32C. The default is CRC32. If
  PureJavaCRC32 algoritm is available in the classpath, then it is used, 
otherwise it falls back to using java.util.zip.CRC32. Each checksum value is 
assumed to be 4 bytes,
  it is currently not configurable (any comments here?). The reflection-method 
of
  creating checksum objects is reworked to incur much lower overhead.

  2. If a hbase-level crc check fails, then it falls back to using hdfs-level
  checksums for the next few reads (defalts to 100). After that, it will retry
  using hbase-level checksums. I picked 100 as the default so that even in the 
case
  of continuous hbase-checksum failures, the overhead for additionals iops is 
limited
  to 1%. Enahnced unit test to validate this behaviour.

  3. Enhanced unit tests to test different sizes of bytesPerChecksum. Also, 
added
  JMX metrics to record the number of times hbase-checksum verification 
failures occur.

REVISION DETAIL
  https://reviews.facebook.net/D1521

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
  src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
  src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java
  src/main/java/org/apache/hadoop/hbase/util/HFileSystem.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumByteArrayOutputStream.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5074:
--

Status: Patch Available  (was: Open)

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Hadoop QA (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201200#comment-13201200
]

Hadoop QA commented on HBASE-5074:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12513416/D1521.3.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 76 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -133 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 161 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
org.apache.hadoop.hbase.util.TestMergeTool
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles
org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit
org.apache.hadoop.hbase.io.hfile.TestHFileBlock
org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/907//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/907//console

This message is automatically generated.

support checksums in HBase block cache
--

Key: HBASE-5074
URL: https://issues.apache.org/jira/browse/HBASE-5074
Project: HBase
Issue Type: Improvement
Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch,
D1521.2.patch, D1521.3.patch, D1521.3.patch

The current implementation of HDFS stores the data in one block file and the
metadata(checksum) in another block file. This means that every read into the
HBase block cache actually consumes two disk iops, one to the datafile and
one to the checksum file. This is a major problem for scaling HBase, because
HBase is usually bottlenecked on the number of random disk iops that the
storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2038) Coprocessors: Region level indexing

2012-02-06 Thread Anoop Sam John (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201205#comment-13201205
]

Anoop Sam John commented on HBASE-2038:
---

Hi Lars,
I am also trying for a secondary index and I have seen the IHBase
concept being good.. But we need this to be moved to coprocessor based so that
the kernel code of HBase need not be different for the secondary index. IHBase
makes the scan go through all the regions ( as u said ) but they will skip and
seek to the later positions in the heap avoid so many possible data read from
HDFS etc...
When I saw the current co processor, we call preScannerNext() from
HRegionServer next(final long scannerId, int nbRows) and pass the
RegionScanner here to the co processor. But as per the IHBase way, within the
co processor we should be able to seek to the correct row where the indexed col
val equals our value. But we can not do this as of now as RegionScanner seek()
not there.

Also this preScannerNext() will be called once before the actual next(final
long scannerId, int nbRows) call happening on the region. Here as per the cache
value at the client side the nbRows might be more than one. Now suppose this is
nbRows=2 and in the region we have 2 rows one at some what in the middle part
of an HFile and the other at another HFile. Now as per IHBase we should 1st
seek to the 1st position of the row and after reading this data should seek to
the next position. Now as per the current way of calling of preScannerNext()
this wont be possible. So I think we might need some change in these area?
What do u say?

Mean while what is your plan to continue with the way of IHBase storing the
index in memory for each of the region or some change in this?

Coprocessors: Region level indexing
---

Key: HBASE-2038
URL: https://issues.apache.org/jira/browse/HBASE-2038
Project: HBase
Issue Type: New Feature
Components: coprocessors
Reporter: Andrew Purtell
Priority: Minor

HBASE-2037 is a good candidate to be done as coprocessor. It also serve as a
good goalpost for coprocessor environment design -- there should be enough of
it so region level indexing can be reimplemented as a coprocessor without any
loss of functionality.

[jira] [Created] (HBASE-5338) Add SKIP support to importtsv

2012-02-06 Thread Lars George (Created) (JIRA)

Add SKIP support to importtsv 
--

 Key: HBASE-5338
 URL: https://issues.apache.org/jira/browse/HBASE-5338
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars George
Priority: Trivial


It'd be nice to have support for SKIP mappings so that you can omit columns 
from the TSV during the import. For example

{code}
-Dimporttsv.columns=SKIP,HBASE_ROW_KEY,cf1:col1,cf1:col2,SKIP,SKIP,cf2:col1...
{code}

Or maybe HBASE_SKIP_COLUMN to be less ambiguous. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5339) Add support for compound keys to importtsv

2012-02-06 Thread Lars George (Created) (JIRA)

Add support for compound keys to importtsv
--

 Key: HBASE-5339
 URL: https://issues.apache.org/jira/browse/HBASE-5339
 Project: HBase
  Issue Type: Improvement
Reporter: Lars George




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5339) Add support for compound keys to importtsv

2012-02-06 Thread Lars George (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars George updated HBASE-5339:
---

Component/s: mapreduce
Description: 
Add support that you can combine some columns from the TSV with either a given 
separator, no separator, or with a custom row key generator class. Syntax could 
be:

{code}
-Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3
-Dimporttsv.rowkey.separator=-
{code}

Another option of course is using the custom mapper class and handle this 
there, but this also seems like a nice to have option, probably often covering 
the 80% this sort of thing is needed.
   Priority: Trivial  (was: Major)

 Add support for compound keys to importtsv
 --

 Key: HBASE-5339
 URL: https://issues.apache.org/jira/browse/HBASE-5339
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars George
Priority: Trivial

 Add support that you can combine some columns from the TSV with either a 
 given separator, no separator, or with a custom row key generator class. 
 Syntax could be:
 {code}
 -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3
 -Dimporttsv.rowkey.separator=-
 {code}
 Another option of course is using the custom mapper class and handle this 
 there, but this also seems like a nice to have option, probably often 
 covering the 80% this sort of thing is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5339) Add support for compound keys to importtsv

2012-02-06 Thread Lars George (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201255#comment-13201255
 ] 

Lars George commented on HBASE-5339:


Obviously, you can rearrange the compound key parts by using different 
HBASE_ROW_KEY_N, where N is user knowledge.

{code}
-Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_3,cf1:col1,cf2:col3,HBASE_ROW_KEY_2
{code}

 Add support for compound keys to importtsv
 --

 Key: HBASE-5339
 URL: https://issues.apache.org/jira/browse/HBASE-5339
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars George
Priority: Trivial

 Add support that you can combine some columns from the TSV with either a 
 given separator, no separator, or with a custom row key generator class. 
 Syntax could be:
 {code}
 -Dimporttsv.columns=HBASE_ROW_KEY_1,HBASE_ROW_KEY_2,cf1:col1,cf2:col3,HBASE_ROW_KEY_3
 -Dimporttsv.rowkey.separator=-
 {code}
 Another option of course is using the custom mapper class and handle this 
 there, but this also seems like a nice to have option, probably often 
 covering the 80% this sort of thing is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5335) Dynamic Schema Configurations

2012-02-06 Thread Nicolas Spiegelberg (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201351#comment-13201351
 ] 

Nicolas Spiegelberg commented on HBASE-5335:


@Lars: the original idea was to allow users to arbitrarily set KVs in the 
HTableDescriptor and HColumnDescriptor, but make it so users know that what 
they're doing is not checked.  Need some sort of format to distinguish between 
reserved keywords and non-reserved (thinking of doing this on the client side). 
 As a config value becomes more well-known, we can enforce limitations like you 
stated.

I'd rather have this evolve by having a handful of users who want to set a 
config value, learn over the long-term that this is useful, and incrementally 
refactor the code to ease support for that config.  I don't want to get into a 
spot where we have to do a large refactor to support this feature  do 
extensive sanity checking, only to determine that we only need 20% of the 
config values.

 Dynamic Schema Configurations
 -

 Key: HBASE-5335
 URL: https://issues.apache.org/jira/browse/HBASE-5335
 Project: HBase
  Issue Type: New Feature
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
  Labels: configuration, schema

 Currently, the ability for a core developer to add per-table  per-CF 
 configuration settings is very heavyweight.  You need to add a reserved 
 keyword all the way up the stack  you have to support this variable 
 long-term if you're going to expose it explicitly to the user.  This has 
 ended up with using Configuration.get() a lot because it is lightweight and 
 you can tweak settings while you're trying to understand system behavior 
 [since there are many config params that may never need to be tuned].  We 
 need to add the ability to put  read arbitrary KV settings in the HBase 
 schema.  Combined with online schema change, this will allow us to safely 
 iterate on configuration settings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201354#comment-13201354
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4833
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10616

Do we need to be sorting rowsToLock?

I'm thinking of multiple concurrent mutateRows operation, trying to lock 
the same set of rows.

Perhaps, throwing IOException is going to prevent us from a situation where 
we end up with a deadlock. But, we still might want to sort it to ensure 
(better) progress (no livelock).


- Amitanand


On 2012-02-03 19:59:55, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-03 19:59:55)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1239953 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201365#comment-13201365
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--



bq.  On 2012-02-06 15:52:43, Amitanand Aiyer wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4212
bq.   https://reviews.apache.org/r/3748/diff/2/?file=72266#file72266line4212
bq.  
bq.   Do we need to be sorting rowsToLock?
bq.   
bq.   I'm thinking of multiple concurrent mutateRows operation, trying to 
lock the same set of rows.
bq.   
bq.   Perhaps, throwing IOException is going to prevent us from a 
situation where we end up with a deadlock. But, we still might want to sort it 
to ensure (better) progress (no livelock).

MutateRows sorts them (by using a TreeSet with Bytes.BYTES_COMPARATOR, for 
exactly this reason.
Maybe this should be called out here, by making the argument a SortedSet.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4833
---


On 2012-02-03 19:59:55, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-03 19:59:55)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1239953 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1239953 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201400#comment-13201400
 ] 

Phabricator commented on HBASE-5074:


tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is 
not safe. See 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/:

  Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to 
org.apache.hadoop.hbase.util.HFileSystem
at 
org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we 
default to CRC32C ?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is 
needed.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we 
name this variable ctor ?

  Similar comment applies to other meth variables in this patch.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201401#comment-13201401
 ] 

Phabricator commented on HBASE-5074:


tedyu has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is 
not safe. See 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/:

  Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to 
org.apache.hadoop.hbase.util.HFileSystem
at 
org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we 
default to CRC32C ?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is 
needed.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we 
name this variable ctor ?

  Similar comment applies to other meth variables in this patch.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5074:
--

Comment: was deleted

(was: tedyu has commented on the revision [jira] [HBASE-5074] Support 
checksums in HBase block cache.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:425 This cast is 
not safe. See 
https://builds.apache.org/job/PreCommit-HBASE-Build/907//testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFiles/testSimpleLoad/:

  Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.DistributedFileSystem cannot be cast to 
org.apache.hadoop.hbase.util.HFileSystem
at 
org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:425)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:433)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:407)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:328)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:326)
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:160 Should we 
default to CRC32C ?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:2 No year is 
needed.
  src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java:59 Shall we 
name this variable ctor ?

  Similar comment applies to other meth variables in this patch.

REVISION DETAIL
  https://reviews.facebook.net/D1521
)

 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5267) Add a configuration to disable the slab cache by default

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201408#comment-13201408
 ] 

Zhihong Yu commented on HBASE-5267:
---

@J-D:
Do you want to take a look at patch v3 ?

 Add a configuration to disable the slab cache by default
 

 Key: HBASE-5267
 URL: https://issues.apache.org/jira/browse/HBASE-5267
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: 5267.txt, 5267v2.txt, 5267v3.txt


 From what I commented at the tail of HBASE-4027:
 {quote}
 I changed the release note, the patch doesn't have a hbase.offheapcachesize 
 configuration and it's enabled as soon as you set -XX:MaxDirectMemorySize 
 (which is actually a big problem when you consider this: 
 http://hbase.apache.org/book.html#trouble.client.oome.directmemory.leak). 
 {quote}
 We need to add hbase.offheapcachesize and set it to false by default.
 Marking as a blocker for 0.92.1 and assigning to Li Pi at Todd's request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201431#comment-13201431
 ] 

Zhihong Yu commented on HBASE-5317:
---

I don't see ClassNotFoundException in test output but the following may provide 
a clue:
{code}
2012-02-06 09:44:48,377 WARN  [main] mapreduce.JobSubmitter(139): Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
2012-02-06 09:44:48,380 WARN  [main] mapreduce.JobSubmitter(241): No job jar 
file set.  User classes may not be found. See Job or Job#setJar(String).
2012-02-06 09:44:51,163 WARN  [ContainersLauncher #0] 
nodemanager.DefaultContainerExecutor(192): Exit code from task is : 127
2012-02-06 09:44:51,165 WARN  [ContainersLauncher #0] 
launcher.ContainerLaunch(273): Container exited with a non-zero exit code 127
{code}

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Jai Kumar Singh (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jai Kumar Singh updated HBASE-5166:
---

Status: Patch Available  (was: Open)

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Jai Kumar Singh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201433#comment-13201433
 ] 

Jai Kumar Singh commented on HBASE-5166:


Any comments ??

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201439#comment-13201439
 ] 

Zhihong Yu commented on HBASE-5166:
---

MultithreadedTableMapper misses Apache license

{code}
+while(!executor.isTerminated()){
+  // wait till all the threads are done
+}
{code}
We should put sleep() in the above loop and possibly limit the total duration 
of wait.

A new unit test should be added for MultithreadedTableMapper.
Please look at tests that use TableMapper.

 MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
 --

 Key: HBASE-5166
 URL: https://issues.apache.org/jira/browse/HBASE-5166
 Project: HBase
  Issue Type: Improvement
Reporter: Jai Kumar Singh
Priority: Minor
  Labels: multithreaded, tablemapper
 Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 
 0003-Added-MultithreadedTableMapper-HBASE-5166.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 There is no MultiThreadedTableMapper in hbase currently just like we have a 
 MultiThreadedMapper in Hadoop for IO Bound Jobs. 
 UseCase, webcrawler: take input (urls) from a hbase table and put the content 
 (urls, content) back into hbase. 
 Running these kind of hbase mapreduce job with normal table mapper is quite 
 slow as we are not utilizing CPU fully (N/W IO Bound).
 Moreover, I want to know whether It would be a good/bad idea to use HBase for 
 these kind of usecases ?. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201444#comment-13201444
 ] 

Zhihong Yu commented on HBASE-5317:
---

For HBASE-5317-v1.patch, I think we shouldn't simply catch TableExistsException.
We should add missing Configuration parameters in MRv2 so that there is no 
TableExistsException during test.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201460#comment-13201460
 ] 

Zhihong Yu commented on HBASE-5317:
---

Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201460#comment-13201460
 ] 

Zhihong Yu edited comment on HBASE-5317 at 2/6/12 6:47 PM:
---

Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2 (when hadoop.profile property 
carries value of 23).

  was (Author: zhi...@ebaysf.com):
Configuration.handleDeprecation() is private.
We may need to borrow deprecatedKeyMap and come up with good strategy of 
providing up-to-date config parameters to MRv2.
  
 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Gregory Chanan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201473#comment-13201473
 ] 

Gregory Chanan commented on HBASE-5317:
---

@Ted:
I agree we shouldn't just catch the TableExistsException -- that's why I didn't 
initially post that part of the patch.  My recollection of that issue was that 
the MiniMRCluster is creating a target directory in hbase.rootdir [I could be 
wrong about the exact location].  When we call table.getTableDescriptor() it 
can't get a table descriptor for target, so throws a TableExistsException.  
Can the handleDeprecation call prevent the target directory from being created? 
 Or are you thinking of something else?

It also seems a little strange to me that calling table.getTableDescriptor(); 
tries to get table descriptors for *everything* in hbase.rootdir.  Why should I 
get an exception thrown if hbase can't find a TableDescriptor for target when 
I am only asking about table?

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5340) HFile/LoadIncrementalHFiles should specify file name when it fails to load a file.

2012-02-06 Thread Jonathan Hsieh (Created) (JIRA)

HFile/LoadIncrementalHFiles should specify file name when it fails to load a 
file.
--

 Key: HBASE-5340
 URL: https://issues.apache.org/jira/browse/HBASE-5340
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.90.5
Reporter: Jonathan Hsieh


I was attempting to do a bulk load and got this error message.  Unfortunately 
it didn't tell me which file had the problem.

{code}
Exception in thread main java.io.IOException: Trailer 'header' is wrong; does 
the trailer size match content?
at 
org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527)
at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885)
at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryLoad(LoadIncrementalHFiles.java:204)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:173)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:452)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:457)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201488#comment-13201488
 ] 

Zhihong Yu commented on HBASE-5317:
---

handleDeprecation() call may not prevent the target directory from being 
created.
We should try to find out why 'target' directory was only created for hadoop 
0.23.

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201491#comment-13201491
 ] 

Zhihong Yu commented on HBASE-5317:
---

Not every directory under hbase.rootdir is deemed corresponding to a table.
See the following javadoc in FSUtils.getTableDirs():
{code}
   * @return All the table directories under coderootdir/code. Ignore non 
table hbase folders such as
   * .logs, .oldlogs, .corrupt, .META., and -ROOT- folders.
{code}

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201515#comment-13201515
]

jirapos...@reviews.apache.org commented on HBASE-5229:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/
---

(Updated 2012-02-06 19:51:58.341235)

Review request for hbase.

Changes
---

Addressed a few comments.
In addition the client can retry a MultiRowMutation if the first row is not
available in the region (as that might indicate that the region moved).

Summary
---

This builds on HBASE-3584, HBASE-5203, and HBASE-5304.

Multiple Rows can be locked and applied atomically as long as the application
ensures that all rows reside in the same Region (by presplitting or a custom
RegionSplitPolicy).
At SFDC we can use this to colocate subsets of a tenant's data and allow atomic
operations over these subsets.

Obviously this is an advanced features and this prominently called out in the
Javadoc.

This addresses bug HBASE-5229.
https://issues.apache.org/jira/browse/HBASE-5229

Diffs (updated)
-

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
PRE-CREATION

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
1241120

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
1241120

Diff: https://reviews.apache.org/r/3748/diff

Testing
---

Tests added to TestFromClientSide and TestAtomicOperation

Thanks,

Lars

Explore building blocks for multi-row local transactions.
---

Key: HBASE-5229
URL: https://issues.apache.org/jira/browse/HBASE-5229
Project: HBase
Issue Type: New Feature
Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt,
5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

HBase should provide basic building blocks for multi-row local transactions.
Local means that we do this by co-locating the data. Global (cross region)
transactions are not discussed here.
After a bit of discussion two solutions have emerged:
1. Keep the row-key for determining grouping and location and allow efficient
intra-row scanning. A client application would then model tables as
HBase-rows.
2. Define a prefix-length in HTableDescriptor that defines a grouping of
rows. Regions will then never be split inside a grouping prefix.
#1 is true to the current storage paradigm of HBase.
#2 is true to the current client side API.
I will explore these two with sample patches here.

Was:
As discussed (at length) on the dev mailing list with the HBASE-3584 and
HBASE-5203 committed, supporting atomic cross row transactions within a
region becomes simple.
I am aware of the hesitation about the usefulness of this feature, but we
have to start somewhere.
Let's use this jira for discussion, I'll attach a patch (with tests)
momentarily to make this concrete.

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201529#comment-13201529
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  I haven't thought about it quite enough, but is there any way to do this 
without leaking the HFileSystem out to the rest of the code? As Ted pointed 
out, there are some somewhat public interfaces that will probably get touched 
by that, and the number of places it has required changes in unrelated test 
cases seems like a code smell to me.

  Maybe this could be a static cache somewhere, that given a FileSystem 
instance, it maintains the un-checksumed equivalents thereof as weak 
references? Then the concept would be self-contained within the HFile code, 
which up til now has been a fairly standalone file format.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201527#comment-13201527
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  I haven't thought about it quite enough, but is there any way to do this 
without leaking the HFileSystem out to the rest of the code? As Ted pointed 
out, there are some somewhat public interfaces that will probably get touched 
by that, and the number of places it has required changes in unrelated test 
cases seems like a code smell to me.

  Maybe this could be a static cache somewhere, that given a FileSystem 
instance, it maintains the un-checksumed equivalents thereof as weak 
references? Then the concept would be self-contained within the HFile code, 
which up til now has been a fairly standalone file format.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5336) Spurious exceptions in HConnectionImplementation

2012-02-06 Thread Lars Hofhansl (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201536#comment-13201536
 ] 

Lars Hofhansl commented on HBASE-5336:
--

Interestingly I find no matching logs on the RegionServers or Datanodes.
I feel like I have seen a jira about this before, but I cannot find it.

 Spurious exceptions in HConnectionImplementation
 

 Key: HBASE-5336
 URL: https://issues.apache.org/jira/browse/HBASE-5336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl

 I have seen this on the client a few time during heave write testing:
 java.util.concurrent.ExecutionException: java.io.IOException: 
 java.io.IOException: java.lang.NullPointerException
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1376)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:891)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:743)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:730)
   at NewsFeedCreate.insert(NewsFeedCreate.java:91)
   at NewsFeedCreate$1.run(NewsFeedCreate.java:38)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.NullPointerException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:212)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1348)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   ... 1 more
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:243)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1289)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1386)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2161)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1954)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3363)
   at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1353)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1351)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
   ... 7 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201556#comment-13201556
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4844
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10643

What if rm contains more than one Mutation ?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/3748/#comment10642

else is not needed considering exception is thrown on line 4170.


- Ted


On 2012-02-06 19:51:58, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-06 19:51:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1241120 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.
 After a bit of discussion two solutions have emerged:
 1. Keep the row-key for determining grouping and location and allow efficient 
 intra-row scanning. A client application would then model tables as 
 HBase-rows.
 2. Define a prefix-length in HTableDescriptor that defines a grouping of 
 rows. Regions will then never be split inside a grouping prefix.
 #1 is true to the current storage paradigm of HBase.
 #2 is true to the current client side API.
 I will explore these two with sample patches here.
 
 Was:
 As discussed (at length) on the dev mailing list with the HBASE-3584 and

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread Zhihong Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201563#comment-13201563
]

Zhihong Yu commented on HBASE-5229:
---

For my first comment, RowMutation maintains single row in internalAdd().
So it should be fine passing the row directly to internalMutate().

Explore building blocks for multi-row local transactions.
---

Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt,
5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201569#comment-13201569
 ] 

Phabricator commented on HBASE-5074:


mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  @dhruba; thanks for the fixes! Here are some more comments (I still have to 
go through the last 25% of the new version of the patch).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 
Please address this comment. The javadoc says major and the variable name 
says minor.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 
Please correct the misspelling.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I 
think this function needs to be renamed to expectAtLeastMajorVersion for clarity
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we 
should either consistently use the onDiskSizeWithHeader field or get rid of it.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please 
do use a constant instead of 0 here for the minor version.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy 
initialization is not thread-safe. This also applies to other enum members 
below. Can the meth field be initialized on the enum constructor, or do we rely 
on some classes being loaded by the time this initialization is invoked?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid 
repeating org.apache.hadoop.util.PureJavaCrc32 three times in string form
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid 
repeating the java.util.zip.CRC32 string
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid 
repeating the string
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 
Inconsistent formatting: 1024   +980.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201570#comment-13201570
 ] 

Phabricator commented on HBASE-5074:


mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  @dhruba; thanks for the fixes! Here are some more comments (I still have to 
go through the last 25% of the new version of the patch).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:119 
Please address this comment. The javadoc says major and the variable name 
says minor.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:49 
Please correct the misspelling.
  src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java:352 I 
think this function needs to be renamed to expectAtLeastMajorVersion for clarity
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java:287 I think we 
should either consistently use the onDiskSizeWithHeader field or get rid of it.
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java:220 Please 
do use a constant instead of 0 here for the minor version.
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:3551 Long line
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:60 This lazy 
initialization is not thread-safe. This also applies to other enum members 
below. Can the meth field be initialized on the enum constructor, or do we rely 
on some classes being loaded by the time this initialization is invoked?
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:63-67 Avoid 
repeating org.apache.hadoop.util.PureJavaCrc32 three times in string form
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:74-75 Avoid 
repeating the java.util.zip.CRC32 string
  src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java:98-99 Avoid 
repeating the string
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:132 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:174 Fix 
indentation
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java:71 
Inconsistent formatting: 1024   +980.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201614#comment-13201614
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Todd: I agree with you. It is messy that the HFileSystem interface is leaking 
out to the unit tests. Instead, inside HFile, I can do something like this when 
a Reader is created:

  if (!fs instanceof HFileSystem) {
fs = new HFileSystem(fs);
  }

  what this means is that users of HFile that already passes in a HFileSystem 
will get the new behaviour while. HReginServer anyways voluntarily creates 
HFileSystem before invoking HFile, so it work.

  I did not do this earlier because I thought that 'using reflection' is 
costly, but on second thoughts the cost is not much because it will be done 
only once when a new reader is created for the first time. what do you think?


REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201615#comment-13201615
 ] 

Phabricator commented on HBASE-5074:


dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Todd: I agree with you. It is messy that the HFileSystem interface is leaking 
out to the unit tests. Instead, inside HFile, I can do something like this when 
a Reader is created:

  if (!fs instanceof HFileSystem) {
fs = new HFileSystem(fs);
  }

  what this means is that users of HFile that already passes in a HFileSystem 
will get the new behaviour while. HReginServer anyways voluntarily creates 
HFileSystem before invoking HFile, so it work.

  I did not do this earlier because I thought that 'using reflection' is 
costly, but on second thoughts the cost is not much because it will be done 
only once when a new reader is created for the first time. what do you think?


REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201619#comment-13201619
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Yea, I think the instanceof check and confining HFileSystem to be only within 
the hfile package is much better.

  I don't think it should be costly -- as you said, it's only when the reader 
is created, which isn't on the hot code path, and instanceof checks are 
actually quite fast. They turn into a simple compare of the instance's klassid 
header against a constant, if I remember correctly.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201618#comment-13201618
 ] 

Phabricator commented on HBASE-5074:


todd has commented on the revision [jira] [HBASE-5074] Support checksums in 
HBase block cache.

  Yea, I think the instanceof check and confining HFileSystem to be only within 
the hfile package is much better.

  I don't think it should be costly -- as you said, it's only when the reader 
is created, which isn't on the hot code path, and instanceof checks are 
actually quite fast. They turn into a simple compare of the instance's klassid 
header against a constant, if I remember correctly.

REVISION DETAIL
  https://reviews.facebook.net/D1521


 support checksums in HBase block cache
 --

 Key: HBASE-5074
 URL: https://issues.apache.org/jira/browse/HBASE-5074
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, 
 D1521.2.patch, D1521.3.patch, D1521.3.patch


 The current implementation of HDFS stores the data in one block file and the 
 metadata(checksum) in another block file. This means that every read into the 
 HBase block cache actually consumes two disk iops, one to the datafile and 
 one to the checksum file. This is a major problem for scaling HBase, because 
 HBase is usually bottlenecked on the number of random disk iops that the 
 storage-hardware offers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-02-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201624#comment-13201624
 ] 

jirapos...@reviews.apache.org commented on HBASE-5229:
--



bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4152
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4152
bq.  
bq.   What if rm contains more than one Mutation ?

Hopefully rm does contain more than one Mutation, otherwise using this API is 
pointless. :)
It is guranteed, though, that all Mutations are for this single row.

Do you see a concern?


bq.  On 2012-02-06 20:24:17, Ted Yu wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
 line 4171
bq.   https://reviews.apache.org/r/3748/diff/3/?file=72610#file72610line4171
bq.  
bq.   else is not needed considering exception is thrown on line 4170.

Right. But this makes the flow clear. Personally I am not a big fan of having 
to look through code and having to piece together the control flow by tracking 
exceptions and return statements.
I don't mind changing it, though.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3748/#review4844
---


On 2012-02-06 19:51:58, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3748/
bq.  ---
bq.  
bq.  (Updated 2012-02-06 19:51:58)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This builds on HBASE-3584, HBASE-5203, and HBASE-5304.
bq.  
bq.  Multiple Rows can be locked and applied atomically as long as the 
application ensures that all rows reside in the same Region (by presplitting or 
a custom RegionSplitPolicy).
bq.  At SFDC we can use this to colocate subsets of a tenant's data and allow 
atomic operations over these subsets.
bq.  
bq.  Obviously this is an advanced features and this prominently called out in 
the Javadoc.
bq.  
bq.  
bq.  This addresses bug HBASE-5229.
bq.  https://issues.apache.org/jira/browse/HBASE-5229
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiRowMutation.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 1241120 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java
 1241120 
bq.  
bq.  Diff: https://reviews.apache.org/r/3748/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Tests added to TestFromClientSide and TestAtomicOperation
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



 Explore building blocks for multi-row local transactions.
 ---

 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5229-multiRow-v2.txt, 5229-multiRow.txt, 
 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt


 HBase should provide basic building blocks for multi-row local transactions. 
 Local means that we do this by co-locating the data. Global (cross region) 
 transactions are not discussed here.

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201679#comment-13201679
]

Phabricator commented on HBASE-5074:

mbautin has commented on the revision [jira] [HBASE-5074] Support checksums in
HBase block cache.

Some more comments. I am still concerned about the copy-paste stuff in
backwards-compatibility checking. Is there a way to minimize that?

I also mentioned this in the comments below, but it would probably make sense
to add more canned files in the no-checksum format generated by the old
writer and read them with the new reader, the same way HFile v1 compatibility
is ensured. I don't mind keeping the old writer code around in the unit test,
but I think it is best to remove as much code from that legacy writer as
possible (e.g. versatile API, toString, etc.) and only leave the parts
necessary to generate the file for testing.

INLINE COMMENTS

src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:164
Long line

src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:83
Can this be made private if it is not accessed outside of this class?

src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:78
Use ALL_CAPS for constants

src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:76
There seems to be a lot of copy-and-paste from the old HFileBlock code here.
Is there a way to reduce that?

I think we also need to create some canned old-format HFiles (using the old
code) and read them with the new reader code as part of the test.

src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:365
Make this class final.

Also, it would make sense to strip this class down as much as possible to
maintain the bare minimum of code required to test compatibility (if you have
not done that already).

src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java:800
Do we ever use this function?

src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java:188
Is 0 the minor version with no checksums? If so, please replace it with a
constant for readability.

src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java:356
Is 0 the minor version with no checksums? If so, please replace it with a
constant for readability.

src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java:300
Is 0 the minor version with no checksums? If so, please replace it with a
constant for readability.

REVISION DETAIL
https://reviews.facebook.net/D1521

support checksums in HBase block cache
--

[jira] [Commented] (HBASE-5074) support checksums in HBase block cache

2012-02-06 Thread Phabricator (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201678#comment-13201678
]