[jira] [Updated] (HBASE-5640) bulk load runs slowly than before
[ https://issues.apache.org/jira/browse/HBASE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5640: Attachment: bulkLoadFs2.txt It is better to compare the URIs that to use object equality. The object equality does not work because one object is of type FileSystem while the other object is a HFileSystem. bulk load runs slowly than before - Key: HBASE-5640 URL: https://issues.apache.org/jira/browse/HBASE-5640 Project: HBase Issue Type: Bug Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Labels: bulkloader Attachments: bulkLoadFs1.txt, bulkLoadFs2.txt I am loading data from an external system into hbase. There are many prints of the form. This is possibly a regression caused by a recent patch. on different filesystem than destination store - moving to this filesystem -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5640) bulk load runs slowly than before
[ https://issues.apache.org/jira/browse/HBASE-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5640: Attachment: bulkLoadFs1.txt This is the fix I have in mind but have not yet tested in great detail. bulk load runs slowly than before - Key: HBASE-5640 URL: https://issues.apache.org/jira/browse/HBASE-5640 Project: HBase Issue Type: Bug Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Labels: bulkloader Attachments: bulkLoadFs1.txt I am loading data from an external system into hbase. There are many prints of the form. This is possibly a regression caused by a recent patch. on different filesystem than destination store - moving to this filesystem -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5074: Status: Open (was: Patch Available) support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.11.patch, D1521.11.patch, D1521.12.patch, D1521.12.patch, D1521.13.patch, D1521.13.patch, D1521.14.patch, D1521.14.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5074: Status: Patch Available (was: Open) support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: D1521.1.patch, D1521.1.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.10.patch, D1521.11.patch, D1521.11.patch, D1521.12.patch, D1521.12.patch, D1521.13.patch, D1521.13.patch, D1521.14.patch, D1521.14.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch, D1521.9.patch, D1521.9.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5473) Metrics does not push pread time
[ https://issues.apache.org/jira/browse/HBASE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5473: Assignee: dhruba borthakur Status: Patch Available (was: Open) Metrics does not push pread time Key: HBASE-5473 URL: https://issues.apache.org/jira/browse/HBASE-5473 Project: HBase Issue Type: Bug Components: metrics Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor The RegionServerMetrics is not pushing the pread times to the MetricsRecord -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4658) Put attributes are not exposed via the ThriftServer
[ https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4658: Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Put attributes are not exposed via the ThriftServer --- Key: HBASE-4658 URL: https://issues.apache.org/jira/browse/HBASE-4658 Project: HBase Issue Type: Bug Components: thrift Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: D1563.1.patch, D1563.1.patch, D1563.1.patch, D1563.2.patch, D1563.2.patch, D1563.2.patch, D1563.3.patch, D1563.3.patch, D1563.3.patch, ThriftPutAttributes1.txt The Put api also takes in a bunch of arbitrary attributes that an application can use to associate metadata with each put operation. This is not exposed via Thrift. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4658) Put attributes are not exposed via the ThriftServer
[ https://issues.apache.org/jira/browse/HBASE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4658: Status: Patch Available (was: Open) Put attributes are not exposed via the ThriftServer --- Key: HBASE-4658 URL: https://issues.apache.org/jira/browse/HBASE-4658 Project: HBase Issue Type: Bug Components: thrift Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1563.1.patch, D1563.1.patch, D1563.1.patch, D1563.2.patch, D1563.2.patch, D1563.2.patch, D1563.3.patch, D1563.3.patch, D1563.3.patch, ThriftPutAttributes1.txt The Put api also takes in a bunch of arbitrary attributes that an application can use to associate metadata with each put operation. This is not exposed via Thrift. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5074: Status: Open (was: Patch Available) This patch is not yet ready for submission. It needs enhancement with a unit test and metrics collection. support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5295) Improve the Thrift API to switch on/off writing to wal for Mutations
[ https://issues.apache.org/jira/browse/HBASE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5295: Status: Patch Available (was: Open) Improve the Thrift API to switch on/off writing to wal for Mutations - Key: HBASE-5295 URL: https://issues.apache.org/jira/browse/HBASE-5295 Project: HBase Issue Type: Improvement Components: thrift Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1515.1.patch, D1515.1.patch, D1515.1.patch, D1515.1.patch The thrift api currently does not support switching off updating wal for Puts/Deletes. Support it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint
[ https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4938: Status: Open (was: Patch Available) Create a HRegion.getScanner public method that allows reading from a specified readPoint Key: HBASE-4938 URL: https://issues.apache.org/jira/browse/HBASE-4938 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: scannerMVCC1.txt, scannerMVCC1.txt There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint
[ https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4938: Status: Patch Available (was: Open) Create a HRegion.getScanner public method that allows reading from a specified readPoint Key: HBASE-4938 URL: https://issues.apache.org/jira/browse/HBASE-4938 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: scannerMVCC1.txt, scannerMVCC1.txt There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint
[ https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4938: Attachment: scannerMVCC1.txt Attaching the same patch file again Create a HRegion.getScanner public method that allows reading from a specified readPoint Key: HBASE-4938 URL: https://issues.apache.org/jira/browse/HBASE-4938 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: scannerMVCC1.txt, scannerMVCC1.txt There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5014) PutSortReducer and KeyValueSortReduce should adhere to memory limits
[ https://issues.apache.org/jira/browse/HBASE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5014: Attachment: putSortReducer1.txt Attached patch from Review. PutSortReducer and KeyValueSortReduce should adhere to memory limits Key: HBASE-5014 URL: https://issues.apache.org/jira/browse/HBASE-5014 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: putSortReducer1.txt The PutSortReduce class has a configurable threshold to flush partial sorted data for large rows. However, it was not using the size of the key in the calculation of overall memory used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint
[ https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4938: Attachment: scannerMVCC1.txt Attached patch from review. Create a HRegion.getScanner public method that allows reading from a specified readPoint Key: HBASE-4938 URL: https://issues.apache.org/jira/browse/HBASE-4938 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: scannerMVCC1.txt There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint
[ https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4938: Status: Open (was: Patch Available) Create a HRegion.getScanner public method that allows reading from a specified readPoint Key: HBASE-4938 URL: https://issues.apache.org/jira/browse/HBASE-4938 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: scannerMVCC1.txt There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint
[ https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4938: Status: Patch Available (was: Open) Submitting patch again, hoping that it will be picked up by committers and automatic build testing. Create a HRegion.getScanner public method that allows reading from a specified readPoint Key: HBASE-4938 URL: https://issues.apache.org/jira/browse/HBASE-4938 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: scannerMVCC1.txt There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5014) PutSortReducer and KeyValueSortReduce should adhere to memory limits
[ https://issues.apache.org/jira/browse/HBASE-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-5014: Status: Patch Available (was: Open) Hi Kannan, I addressed your comment and ran all unit tests. PutSortReducer and KeyValueSortReduce should adhere to memory limits Key: HBASE-5014 URL: https://issues.apache.org/jira/browse/HBASE-5014 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: dhruba borthakur Assignee: dhruba borthakur The PutSortReduce class has a configurable threshold to flush partial sorted data for large rows. However, it was not using the size of the key in the calculation of overall memory used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4938) Create a HRegion.getScanner public method that allows reading from a specified readPoint
[ https://issues.apache.org/jira/browse/HBASE-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4938: Status: Patch Available (was: Open) I have run all the unit tests for this one. Create a HRegion.getScanner public method that allows reading from a specified readPoint Key: HBASE-4938 URL: https://issues.apache.org/jira/browse/HBASE-4938 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor There is an existing api HRegion.getScanner(Scan) that allows scanning a table. My proposal is to extend it to HRegion.getScanner(Scan, long readPoint) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4989) Metrics to measure sequential reads and random reads separately
[ https://issues.apache.org/jira/browse/HBASE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4989: Attachment: metrics1.txt Patchfor trunk. Metrics to measure sequential reads and random reads separately --- Key: HBASE-4989 URL: https://issues.apache.org/jira/browse/HBASE-4989 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: metrics1.txt HBase does sequential reads for compactions and positional random reads for satisfying user's queries. It would be nice if we can measure their latencies separately. It is mostly the random reads that dominate a transactional workload. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4989) Metrics to measure sequential reads and random reads separately
[ https://issues.apache.org/jira/browse/HBASE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4989: Release Note: The metric fsReadLatency records the number of sequential reads. The metric fsPreadLatency records the number of random reads. Hadoop Flags: Incompatible change Status: Patch Available (was: Open) Metrics to measure sequential reads and random reads separately --- Key: HBASE-4989 URL: https://issues.apache.org/jira/browse/HBASE-4989 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Priority: Minor Attachments: metrics1.txt HBase does sequential reads for compactions and positional random reads for satisfying user's queries. It would be nice if we can measure their latencies separately. It is mostly the random reads that dominate a transactional workload. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut7.txt Addressed feedback comments from RamKrishna and Jonathan. Fixed unit test to not assert erroneously. Enhanced Memstore.rollback() to rollback keys from memstore and snapshot. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt, appendNoSyncPut7.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut8.txt All unit tests (except DistributedLogSplitting) passes with this patch The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt, appendNoSyncPut7.txt, appendNoSyncPut8.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Status: Patch Available (was: Open) The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt, appendNoSyncPut7.txt, appendNoSyncPut8.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut6.txt After some discussion with Ted and JGray, we decided that the rowlock is not necessary for HRegion.rollbackMemstore. This is the version of the patch that should satisfy all parties. Please review. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, appendNoSyncPut5.txt, appendNoSyncPut6.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers
[ https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4588: Attachment: configVerify2.txt Addressed Ted's review comments. The floating point arithmetic to validate memory allocation configurations need to be done as integers -- Key: HBASE-4588 URL: https://issues.apache.org/jira/browse/HBASE-4588 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: dhruba borthakur Priority: Minor Fix For: 0.92.0 Attachments: configVerify1.txt, configVerify2.txt The floating point arithmetic to validate memory allocation configurations need to be done as integers. On our cluster, we had block cache = 0.6 and memstore = 0.2. It was saying this was 0.8 when it is actually equal. Minor bug but annoying nonetheless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers
[ https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4588: Attachment: configVerify2.txt Attaching the appropriate patch file with review comments fixes. The floating point arithmetic to validate memory allocation configurations need to be done as integers -- Key: HBASE-4588 URL: https://issues.apache.org/jira/browse/HBASE-4588 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: dhruba borthakur Priority: Minor Fix For: 0.92.0 Attachments: configVerify1.txt, configVerify2.txt, configVerify2.txt The floating point arithmetic to validate memory allocation configurations need to be done as integers. On our cluster, we had block cache = 0.6 and memstore = 0.2. It was saying this was 0.8 when it is actually equal. Minor bug but annoying nonetheless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut5.txt Fixed typos. Performance numbers return on hbase-92 with a variant of hdfs 0.20. vanilla hdfs: 1200 put/sec (no patch), 5000 puts/sec (with patch) synconsync hdfs : 80 put/sec (no patch) The synconsync-version-of-hdfs is an internal version of hdfs that makes the datanode issue a sync() on the corresponding ext3 block file for every invocation of DFSClient.sync(). This ensures that a hbase transaction is really,really on disk before the put rpc returns to the client. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt, appendNoSyncPut5.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers
[ https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4588: Attachment: configVerify1.txt Convert floating point numbers to integers so that we use integer-comparision instead of floating point comparision. This fix has been deployed to some of our 0.92 clusters. The floating point arithmetic to validate memory allocation configurations need to be done as integers -- Key: HBASE-4588 URL: https://issues.apache.org/jira/browse/HBASE-4588 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: dhruba borthakur Priority: Minor Fix For: 0.92.0 Attachments: configVerify1.txt The floating point arithmetic to validate memory allocation configurations need to be done as integers. On our cluster, we had block cache = 0.6 and memstore = 0.2. It was saying this was 0.8 when it is actually equal. Minor bug but annoying nonetheless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4588) The floating point arithmetic to validate memory allocation configurations need to be done as integers
[ https://issues.apache.org/jira/browse/HBASE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4588: Status: Patch Available (was: Open) The floating point arithmetic to validate memory allocation configurations need to be done as integers -- Key: HBASE-4588 URL: https://issues.apache.org/jira/browse/HBASE-4588 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: dhruba borthakur Priority: Minor Fix For: 0.92.0 Attachments: configVerify1.txt The floating point arithmetic to validate memory allocation configurations need to be done as integers. On our cluster, we had block cache = 0.6 and memstore = 0.2. It was saying this was 0.8 when it is actually equal. Minor bug but annoying nonetheless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSync5.txt Addressed Kannans, ted and Gary review comments. Changed name of method to rollbackMemstore. And the rollback method now compare memstoreTS before deleting the key. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync5.txt, appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt, appendNoSyncPut4.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut3.txt 1. The flush of memstore waits for current transactions to quiesce before committing the flushed files. This should address the problem pointed out by Kannan. 2. The Hlog.syncer() does not throw an exception, instead causes the regionserver to exit if it is unable to sync to hdfs. The assumption here is that if hbase is unable to write/sync to hdfs, then the simplest and correct error recovery is to exit. (For example, if the memstore flush fails, the regionserver exits) The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut1.txt The changes the multiPut operation so that the sync to the wal occurs outside the rowlock. This enhancement is done only to HRegion.mut(Put[]) because this is the only method that gets invoked from an application. The HRegion.put(Put) is used only by unit tests and should possibly be deprecated. I have attached a unit test. I have not yet run all unit tests, but early feedback on this patch will be very helpful. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut2.txt Incorporated most of Ted's comments and all of Lar's comments. I did not change the order of advancing the rwcc first before releasing the rowlock in the finally-clause because this will occur only in some error case, and in that case it might be better to do things in the normal order. Technically, either way should be fine, but if I am missing something please let me know and I can change it too. In TestParallelPut, I did not fold the two loops of thread-creation and thread-start. The reason being that I would like more concurrency among the threads, and if I create and start in the same loop then it is likely that by the a thread starts running, the earlier ones would probably be finished or advanced significantly, thus reducing the time when all threads are running concurrently. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4487) The increment operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4487: Release Note: The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. The increment operation can release the rowlock before sync-ing the Hlog Key: HBASE-4487 URL: https://issues.apache.org/jira/browse/HBASE-4487 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync4.txt, appendNoSync5.txt, appendNoSync6.txt This allows for better throughput when there are hot rows.I have seen this change make a single row update improve from 400 increments/sec/server to 4000 increments/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4487) The increment operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4487: Release Note: The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. There is a fundamental change to the group-commit behaviour: it batches transactions in HBase code before pushing it down to the wal. (was: The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. ) The increment operation can release the rowlock before sync-ing the Hlog Key: HBASE-4487 URL: https://issues.apache.org/jira/browse/HBASE-4487 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.94.0 Attachments: appendNoSync4.txt, appendNoSync5.txt, appendNoSync6.txt This allows for better throughput when there are hot rows.I have seen this change make a single row update improve from 400 increments/sec/server to 4000 increments/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4487) The increment operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4487: Attachment: appendNoSync4.txt The increment operation releases the rowlock before doing the sync to the HLog. This improves performance of increments on hot rows. Introuced method HLog.appendNoSync() that returns a txid. The increment method then release the rowlock and invokes HLog.sync(txid). The HLog.sync(txid) returns only if all the transactions upto the one identified by that txid has been successfully sycned to HDFS. The increment operation can release the rowlock before sync-ing the Hlog Key: HBASE-4487 URL: https://issues.apache.org/jira/browse/HBASE-4487 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSync4.txt This allows for better throughput when there are hot rows.I have seen this change make a single row update improve from 400 increments/sec/server to 4000 increments/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4487) The increment operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4487: Attachment: appendNoSync5.txt Addressed Ted Yu's review comments. The code that does {code} for (Entry e : pending) { +writer.append(e); + } {code} does not catch exceptions, instead throws an exception to the caller if any of the edits fail to make it to HDFS. In fact, Hbase regionserver exits if an HDFS write/sync fails, this is expected behaviour. The increment operation can release the rowlock before sync-ing the Hlog Key: HBASE-4487 URL: https://issues.apache.org/jira/browse/HBASE-4487 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSync4.txt, appendNoSync5.txt This allows for better throughput when there are hot rows.I have seen this change make a single row update improve from 400 increments/sec/server to 4000 increments/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4477) Ability for an application to store metadata into the transaction log
[ https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4477: Attachment: coprocessorPut1.txt Implemented andrew's suggestion of enahncing the prePut, postPut, preDelete and postDelete apis to take in the Put/Delete object itself. In the process of running tests. Ability for an application to store metadata into the transaction log - Key: HBASE-4477 URL: https://issues.apache.org/jira/browse/HBASE-4477 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: coprocessorPut1.txt, hlogMetadata1.txt mySQL allows an application to store an arbitrary blob along with each transaction in its transaction logs. This JIRA is to have a similar feature request for HBASE. The use case is as follows: An application on one data center A stores a blob of data along with each transaction. A replication software picks up these blobs from the transaction logs in A and hands it to another instance of the same application running on a remote data center B. The application in B is responsible for applying this to the remote Hbase cluster (and also handle conflict resolution if any). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4487) The increment operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4487: Attachment: appendNoSync6.txt All unit tests pass now (expect TestDistributedLogSplitting, TestRollingRestart, TestHTablePool), but I am seeing the same test to fail on trunk, so these failures do not seem to be related to this patch. The one reference to System.err.println() is a printUsage() message that is needed only if u want to run the unit test as a standalone command line utility. There is a single test TestIncrement that creates a 100 threads and ensures that all the concurrent increments match the final expected result. There is a benchmark TestHLogBench that measures the performance of the appendNoSync call. The increment operation can release the rowlock before sync-ing the Hlog Key: HBASE-4487 URL: https://issues.apache.org/jira/browse/HBASE-4487 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSync4.txt, appendNoSync5.txt, appendNoSync6.txt This allows for better throughput when there are hot rows.I have seen this change make a single row update improve from 400 increments/sec/server to 4000 increments/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira