[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490394#comment-13490394 ] Hudson commented on HBASE-5987: --- Integrated in HBase-0.94-security-on-Hadoop-23 #9 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/9/]) HBASE-6032 Port HFileBlockIndex improvement from HBASE-5987 (Liyin, Ted, Stack) (Revision 1399513) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: ASF.LICENSE.NOT.GRANTED--D3237.1.patch, ASF.LICENSE.NOT.GRANTED--D3237.2.patch, ASF.LICENSE.NOT.GRANTED--D3237.3.patch, ASF.LICENSE.NOT.GRANTED--D3237.4.patch, ASF.LICENSE.NOT.GRANTED--D3237.5.patch, ASF.LICENSE.NOT.GRANTED--D3237.6.patch, ASF.LICENSE.NOT.GRANTED--D3237.7.patch, ASF.LICENSE.NOT.GRANTED--D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478636#comment-13478636 ] Lars Hofhansl commented on HBASE-5987: -- @binlijin: We're doing the porting in HBASE-6032. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: ASF.LICENSE.NOT.GRANTED--D3237.1.patch, ASF.LICENSE.NOT.GRANTED--D3237.2.patch, ASF.LICENSE.NOT.GRANTED--D3237.3.patch, ASF.LICENSE.NOT.GRANTED--D3237.4.patch, ASF.LICENSE.NOT.GRANTED--D3237.5.patch, ASF.LICENSE.NOT.GRANTED--D3237.6.patch, ASF.LICENSE.NOT.GRANTED--D3237.7.patch, ASF.LICENSE.NOT.GRANTED--D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478643#comment-13478643 ] binlijin commented on HBASE-5987: - @Lars Hofhansl: Thank you very much. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: ASF.LICENSE.NOT.GRANTED--D3237.1.patch, ASF.LICENSE.NOT.GRANTED--D3237.2.patch, ASF.LICENSE.NOT.GRANTED--D3237.3.patch, ASF.LICENSE.NOT.GRANTED--D3237.4.patch, ASF.LICENSE.NOT.GRANTED--D3237.5.patch, ASF.LICENSE.NOT.GRANTED--D3237.6.patch, ASF.LICENSE.NOT.GRANTED--D3237.7.patch, ASF.LICENSE.NOT.GRANTED--D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478679#comment-13478679 ] Hudson commented on HBASE-5987: --- Integrated in HBase-0.94 #539 (See [https://builds.apache.org/job/HBase-0.94/539/]) HBASE-6032 Port HFileBlockIndex improvement from HBASE-5987 (Liyin, Ted, Stack) (Revision 1399513) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: ASF.LICENSE.NOT.GRANTED--D3237.1.patch, ASF.LICENSE.NOT.GRANTED--D3237.2.patch, ASF.LICENSE.NOT.GRANTED--D3237.3.patch, ASF.LICENSE.NOT.GRANTED--D3237.4.patch, ASF.LICENSE.NOT.GRANTED--D3237.5.patch, ASF.LICENSE.NOT.GRANTED--D3237.6.patch, ASF.LICENSE.NOT.GRANTED--D3237.7.patch, ASF.LICENSE.NOT.GRANTED--D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473317#comment-13473317 ] binlijin commented on HBASE-5987: - Should we backport this issue to 0.94-branch? HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: ASF.LICENSE.NOT.GRANTED--D3237.1.patch, ASF.LICENSE.NOT.GRANTED--D3237.2.patch, ASF.LICENSE.NOT.GRANTED--D3237.3.patch, ASF.LICENSE.NOT.GRANTED--D3237.4.patch, ASF.LICENSE.NOT.GRANTED--D3237.5.patch, ASF.LICENSE.NOT.GRANTED--D3237.6.patch, ASF.LICENSE.NOT.GRANTED--D3237.7.patch, ASF.LICENSE.NOT.GRANTED--D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284578#comment-13284578 ] Hudson commented on HBASE-5987: --- Integrated in HBase-TRUNK #2941 (See [https://builds.apache.org/job/HBase-TRUNK/2941/]) HBASE-6032 Port HFileBlockIndex improvement from HBASE-5987 (Revision 1343413) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284582#comment-13284582 ] Hudson commented on HBASE-5987: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #30 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/30/]) HBASE-6032 Port HFileBlockIndex improvement from HBASE-5987 (Revision 1343413) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279044#comment-13279044 ] Mikhail Bautin commented on HBASE-5987: --- @Ted: we put [89-fb] in the 89-fb versions of our code reviews for a particular JIRA, and omit them from trunk versions of code reviews for the same JIRA. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279065#comment-13279065 ] Zhihong Yu commented on HBASE-5987: --- Which branches would the backport be prepared ? I wonder if the code introduced so far in this JIRA would be kept when part 2 is implemented. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1327#comment-1327 ] Phabricator commented on HBASE-5987: mbautin has closed the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. REVISION DETAIL https://reviews.facebook.net/D3237 COMMIT https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1339581 To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278053#comment-13278053 ] Liyin Tang commented on HBASE-5987: --- Hi Ted, I think we shall use the same jira number for the same fix. It is not necessary to create another jira to port it back to apache trunk. Actually, we are proposing 2 solutions in this jira and I haven't finished the second part yet, which is to index the data block based on its (last_key + 1 ) instead of the start key. So this jira shall not be closed. Thanks HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278059#comment-13278059 ] Zhihong Yu commented on HBASE-5987: --- The second part of the fix is an incompatible change. I would be better to tackle it in another JIRA. The code review subject had '[89-fb]' in it. So I thought this was targeting 0.89-fb branch. Feel free to reopen this issue for backport. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276513#comment-13276513 ] Zhihong Yu commented on HBASE-5987: --- @Liyin: Thanks for the quick turn around. Please let us know the performance improvement compared to what you collected on May 11th. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276941#comment-13276941 ] Phabricator commented on HBASE-5987: mbautin has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. Looks good! A few minor comments inline. Also please submit the diff with lint (using arc diff --preview instead of arc diff --only)/ INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:545 Please add a comment that the actual value is irrelevant because this is always compared by reference. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:437-440 This documentation is still confusing. Is i the ith position, or is the actual key the ith position? I would say i is the position and the returned key is the key at the ith position. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:413 Clarify the meaning of is equal, i.e. that it must be exactly the same object, not just an equal byte array. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:63 This is unnecessary (we don't use compression by default). src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:77 It is not schemMetricSnapshot, it is schemaMetricSnapshot (schem is not a word). REVISION DETAIL https://reviews.facebook.net/D3237 To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277001#comment-13277001 ] Phabricator commented on HBASE-5987: mbautin has accepted the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. Just one minor comment (please address on commit). INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:413 HContants - HConstants (missed an s) REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277017#comment-13277017 ] Phabricator commented on HBASE-5987: todd has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. Would be nice to have a simple benchmark - eg load a million rows and time count 'table', { CACHE = 1000 } from the shell with and without. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java:23 typo: references wrong class name here src/main/java/org/apache/hadoop/hbase/io/hfile/BlockWithScanInfo.java:28 could do with a short javadoc, eg: /** * The first key in the next block following this one in the HFile. * If this key is unknown, this is reference-equal with HConstants.NO_NEXT_INDEXED_KEY */ src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:526 are you guaranteed that firstKey.arrayOffset() == 0 here? I would have assumed firstKey could be an array slice REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277090#comment-13277090 ] Phabricator commented on HBASE-5987: todd has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. Thanks for fixing. I'm surprised the unit tests weren't failing before. Is that because the ByteBuffer usually does have arrayOffset() == 0, so the bug wasn't actually causing a problem? Or do we need more test coverage? REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277131#comment-13277131 ] Phabricator commented on HBASE-5987: Liyin has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. I think we haven't done a seekBefore to the previous block with a reSeekTo in this previous block together. I shall create a unit test to cover that. REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277162#comment-13277162 ] Phabricator commented on HBASE-5987: mbautin has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. The new test looks good. REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277166#comment-13277166 ] Phabricator commented on HBASE-5987: Liyin has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java:110 should be c and g REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277165#comment-13277165 ] Phabricator commented on HBASE-5987: tedyu has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java:110 This doesn't seem to match the code on line 111. REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277218#comment-13277218 ] Hadoop QA commented on HBASE-5987: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527710/D3237.6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 14 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1902//console This message is automatically generated. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277309#comment-13277309 ] Phabricator commented on HBASE-5987: tedyu has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:2 No year, please. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:41 This class doesn't extend HBaseTestCase but uses methods from HBaseTestCase. It would be better to not reference HBaseTestCase. REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5987-fb To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277482#comment-13277482 ] Phabricator commented on HBASE-5987: tedyu has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. This patch shouldn't include HTableMultiplexer changes, right ? REVISION DETAIL https://reviews.facebook.net/D3237 BRANCH HBASE-5776 To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276211#comment-13276211 ] Phabricator commented on HBASE-5987: tedyu has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:411 'is to keep' - 'keeps' src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:415 'it means it' - 'it means that' src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:205 Please add javadoc for the last three parameters src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:208 Can this method be named getDataBlockInfo() ? For 'seekTo', I think DataBlock would be the target, not DataBlockInfo. See comment below w.r.t. naming of DataBlockInfo src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:196 'other attributes' - 'additional attributes' ? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:293 'Only ' can be removed. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockInfo.java:2 No year, please. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:306 Can we use builder pattern to fill out nextIndexedKey ? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockInfo.java:26 Would HFileBlockWithInfo be a better name ? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:480 Should this be ' 0' ? src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:2 Please remove year. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:44 Please add test category. REVISION DETAIL https://reviews.facebook.net/D3237 To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276352#comment-13276352 ] Phabricator commented on HBASE-5987: mbautin has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. Mostly discussed offline with Liyin. Comments are inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockInfo.java:26 I would suggest that we reflect the fact that the block is being scanned in the class name. Perhaps BlockWithScanInfo. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:224 Add a space before - src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java:435 This description is misleading. i is not really the ith indexed key. I would say the position of the index key to retrieve, starting at 0. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:480 Discussed with Liyin offline. 1 seems fine, but there should definitely be a comment describing what happens in case compared == 0 (when the key being searched is the same as the first key of the next block). In that case we are relying on loadBlockAndSeekToKey positioning the scanner just before the key we are interested in, and on StoreFileScanner calling HFileScanner.next() to bring us to the first key we are interested in, potentially in the next block. Also, in case nextIndexedKey == NO_NEXT_INDEXED_KEY, we should do the same thing as if compared 1 (also discussed with Liyin offline). Therefore, the overall condition should be along the lines of: if (this.nextIndexedKey != null (this.nextIndexedKey == NO_NEXT_INDEXED_KEY || reader.getComparator().compare(key, offset, length, nextIndexedKey, 0, nextIndexedKey.length) 1)) { ... } src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java:205 You probably don't want to enhance the deprecated HBaseTestCase class. Instead, try to add new functionality to HBaseTestingUtility. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:44 @tedyu: we don't categorize our tests in 89-fb. This diff will be ported to trunk separately. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:45 Please don't write new tests inheriting from HBaseTestCase. Use HBaseTestingUtility and HTestConst instead. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:82 schemMetricSnapShot - schemaMetricSnapshot src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:91 Shouldn't this be the following? while (s.next(results) || !results.isEmpty()) { results.clear(); } src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:103 Why are we assigning the return value of verifyDataAndIndexBlockRead to this variable? It is not used anywhere. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksScanned.java:107 Are we using the return value anywhere? REVISION DETAIL https://reviews.facebook.net/D3237 To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276355#comment-13276355 ] Phabricator commented on HBASE-5987: Liyin has commented on the revision [jira][89-fb] [HBASE-5987] HFileBlockIndex improvement. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:480 I missed this NO_NEXT_INDEXED_KEY ! Thanks Mikhail for this insightful comment ! REVISION DETAIL https://reviews.facebook.net/D3237 To: Kannan, mbautin, Liyin Cc: JIRA, todd, tedyu HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D3237.1.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273088#comment-13273088 ] Liyin Tang commented on HBASE-5987: --- Attached a screen shot of sequential scan profiling here. It shows that 82% of seek time is spent on querying block index and 67% of seek time is spent on acquiring IdLock for block index. Sounds like it is a big win of avoiding unnecessary querying block index for the scan performance. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273000#comment-13273000 ] Todd Lipcon commented on HBASE-5987: Nice stuff Liyin. I was looking at scan performance a bit last week as well and came to similar conclusions that the reseeks were pretty expensive for this reason. Another thing I noticed is that our CPU cache behavior is pretty bad when the individual KVs are large. When I profiled L2 cache misses in oprofile, I saw a bunch on the call to read the memstoreTS -- assumedly because it fell on a different cache line than the rest of the KV. In my case, the KVs were just over 128 bytes (2 cache lines), including their header fields, lengths etc. So the access pattern looked like: - hit cacheline 0 for kv0 header - hit cacheline 2 for kv0 memstoreTS - hit cacheline 0 repeatedly to do KV comparison - hit cacheline 2 for kv1's header - hit cacheline 4 for kv1's memstoreTS - hit cacheline 2 for kv1 data comparison etc. For whatever reason, my CPU wasn't quite smart enough to kick prefetching in on this access pattern. I tried recompiling JDK7 with the Unsafe.prefetchRead intrinsic, but couldn't get any noticeable improvement with it. So I think for better performance, we need some better in-memory layout for the HFile blocks, so we can get O(lg n) reseeks instead of O(n), for example. Have you seen the same? HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273031#comment-13273031 ] Liyin Tang commented on HBASE-5987: --- Nice summary! I haven't profiled the CPU in details and shall try it :) We found out this problem based on the following 2 experiments: 1) Single HBase client sequentially scans 8G of kvs from one region. Each kv is approximately 50 Byte and all of them are 100% cached in block cache. It takes approximately 4 mins to finish. 2) 20 HBase clients issue the same scan in parallel against the same set of 8G cached kvs. The finish time varies from 20 min to 50 min. In the region server, the network and disk are not busy and cpu usage increased only by 1%. And most of IPC threads for the next call waited for the IdLock to read HFileBlockIndex. HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira