[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943237#comment-16943237 ] Jean-Marc Spaggiari commented on HBASE-16213: - Is there any follow-up JIRA fot V2 and V3? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: Lijin Bin >Assignee: Lijin Bin >Priority: Major > Fix For: 1.4.0, 2.0.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1693#comment-1693 ] stack commented on HBASE-16213: --- HBASE-23055 will allow being able to set DataBlockEncoding.ROW_INDEX_V1 on hbase:meta. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin >Priority: Major > Fix For: 1.4.0, 2.0.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319822#comment-16319822 ] stack commented on HBASE-16213: --- bq. Not actually but yes it worth a try boss stack I'll report back my findings. Thanks [~carp84] > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319613#comment-16319613 ] Yu Li commented on HBASE-16213: --- bq. Did you fellows deploy this on hbase:meta? Not actually but yes it worth a try boss [~stack] > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318828#comment-16318828 ] stack commented on HBASE-16213: --- [~carp84] + [~aoxiang] Did you fellows deploy this on hbase:meta? Thanks. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143316#comment-16143316 ] Yu Li commented on HBASE-16213: --- bq. I think V2 can replace V1, V2 have space optimize and the same seek perf. Let's get HBASE-16594 in then if no objections ([~anoop.hbase] please let us know if any comments here sir, thanks), then people like our mighty [~yangzhe1991] could use it and further improve it (smile). > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213.branch-1.v1.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_v2.patch, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx, hfile_block_performance.pptx, hfile-cpu.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785059#comment-15785059 ] binlijin commented on HBASE-16213: -- Looks like a good idea, i do not think it deeply.. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785035#comment-15785035 ] Phil Yang commented on HBASE-16213: --- bq. because it is need to record how many versions there are. We have to read Cells one by one within a row ? If so it is no need to know how many versions we have and we just use the previous qualifier if the current Cell has no qualifier? Correct me if I am wrong. Thanks. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785001#comment-15785001 ] binlijin commented on HBASE-16213: -- Yes, must in one family. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785000#comment-15785000 ] binlijin commented on HBASE-16213: -- I think V2 can replace V1, V2 have space optimize and the same seek perf. create several kinds of structures? I think it is not a problem. And with the idea of V2 we can also save qualifier only once for versions of cell, right? I think it is hard to see, because it is need to record how many versions there are. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784990#comment-15784990 ] Phil Yang commented on HBASE-16213: --- And for an HFile all cells are must in one family? Am I miss something? Thanks. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784969#comment-15784969 ] Phil Yang commented on HBASE-16213: --- Hi [~aoxiang] Thanks for your important work. Does V2 have any weakness comparing with V1? According to their formats it seems that V2 only has advantage? :) 1.4 or 2.0 have not been released so can we just improve this structure rather than create several kinds of structures? And with the idea of V2 we can also save qualifier only once for versions of cell, right? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451820#comment-15451820 ] Yu Li commented on HBASE-16213: --- Ok, makes sense, thanks for the quick response sir [~mantonov] :-) > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451649#comment-15451649 ] Mikhail Antonov commented on HBASE-16213: - [~carp84] this is a nice work for sure, I just think it's a bit too late for new features like this one to go to 1.3. Let's keep it in 1.4 (branch-1) for now, then depending on more prod testing and numbers on how much we can win we can cherry-pick for 1.3.1.. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451208#comment-15451208 ] Yu Li commented on HBASE-16213: --- Maybe a little bit late but I'm wondering whether this is a good one for 1.3? It's a good but add-on feature which has few changes to core codes. [~mantonov] > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447506#comment-15447506 ] Hudson commented on HBASE-16213: FAILURE: Integrated in Jenkins build HBase-1.4 #378 (See [https://builds.apache.org/job/HBase-1.4/378/]) HBASE-16213 A new HFileBlock structure for fast random get. (binlijin) (anoopsamjohn: rev c899897bc8dc4a7eccc9e2a80fd05ad55654f18e) * (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestSeekToBlockWithEncoders.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteArrayOutputStream.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexSeekerV1.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexCodecV1.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexEncoderV1.java > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444982#comment-15444982 ] Anoop Sam John commented on HBASE-16213: Oh ya.. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444949#comment-15444949 ] binlijin commented on HBASE-16213: -- All the existing tests for DBE would work with this because the new DBE enum will iterate through all.So there is no new test for the new DBE. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444924#comment-15444924 ] binlijin commented on HBASE-16213: -- Because the rows are not ascending order. For example kv5 is sort order before kv4. So the RowIndexEncoderV1#checkRow will throw IOException, and the master already changed also. {code} /** * Test seeking while file is encoded. */ @Test public void testSeekingToBlockWithBiggerNonLength1() throws IOException { List sampleKv = new ArrayList(); KeyValue kv1 = new KeyValue(Bytes.toBytes("aaa"), Bytes.toBytes("f1"), Bytes.toBytes("q1"), Bytes.toBytes("val")); sampleKv.add(kv1); KeyValue kv2 = new KeyValue(Bytes.toBytes("aab"), Bytes.toBytes("f1"), Bytes.toBytes("q1"), Bytes.toBytes("val")); sampleKv.add(kv2); KeyValue kv3 = new KeyValue(Bytes.toBytes("aac"), Bytes.toBytes("f1"), Bytes.toBytes("q1"), Bytes.toBytes("val")); sampleKv.add(kv3); KeyValue kv4 = new KeyValue(Bytes.toBytes("aad"), Bytes.toBytes("f1"), Bytes.toBytes("q1"), Bytes.toBytes("val")); sampleKv.add(kv4); KeyValue kv5 = new KeyValue(Bytes.toBytes("d"), Bytes.toBytes("f1"), Bytes.toBytes("q1"), Bytes.toBytes("val")); sampleKv.add(kv5); KeyValue toSeek = new KeyValue(Bytes.toBytes(""), Bytes.toBytes("f1"), Bytes.toBytes("q1"), Bytes.toBytes("val")); seekToTheKey(kv1, sampleKv, toSeek); } {code} > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444846#comment-15444846 ] Anoop Sam John commented on HBASE-16213: Why the KV changes in TestSeekToBlockWithEncoders? Tests for the new DBE, not copied to branch-1 patch? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439351#comment-15439351 ] Hadoop QA commented on HBASE-16213: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 55s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} branch-1 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 8s {color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s {color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 15m 49s {color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 43s {color} | {color:green} hbase-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 23s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 121m 33s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat | | | hadoop.hbase.replication.regionserver.TestReplicationSourceManager | | | hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat | | T
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439118#comment-15439118 ] binlijin commented on HBASE-16213: -- OK, run the four testcase locally, all passed. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, > hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438858#comment-15438858 ] Anoop Sam John commented on HBASE-16213: Test failures seems to be not related to this patch.. Can u confirm once pls. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, > hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438855#comment-15438855 ] Hadoop QA commented on HBASE-16213: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 2s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} branch-1 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 13s {color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s {color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 17m 50s {color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s {color} | {color:green} hbase-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 44s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 125m 26s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestHRegion | | | hadoop.hbase.regionserver.TestClusterId | | | hadoop.hbase.security.token.TestZKSecretWatcher | | | hadoop.hbase.mapreduce.TestMultiTableSnapshotInp
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438753#comment-15438753 ] binlijin commented on HBASE-16213: -- Ok, rename to branch-1.v4 and upload. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, > HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, > hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438746#comment-15438746 ] Yu Li commented on HBASE-16213: --- Please rename the patch to "branch-1.v3.patch" to trigger the UT against correct branch [~aoxiang] > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438720#comment-15438720 ] Hadoop QA commented on HBASE-16213: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s {color} | {color:red} HBASE-16213 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825641/HBASE-16213_branch1_v3.patch | | JIRA Issue | HBASE-16213 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/3286/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, > HBASE-16213.branch-1.v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438513#comment-15438513 ] Hudson commented on HBASE-16213: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1483 (See [https://builds.apache.org/job/HBase-Trunk_matrix/1483/]) HBASE-16213 A new HFileBlock structure for fast random get. (binlijin) (anoopsamjohn: rev 0d99e827b22f1aedb8595a5d8b76a6085dc5a654) * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexCodecV1.java * (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexSeekerV1.java * (delete) hbase-server/src/main/java/org/apache/hadoop/hbase/SizeCachedKeyValue.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestSeekToBlockWithEncoders.java * (delete) hbase-server/src/main/java/org/apache/hadoop/hbase/SizeCachedNoTagsKeyValue.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/SizeCachedNoTagsKeyValue.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/SizeCachedKeyValue.java * (add) hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexEncoderV1.java > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438506#comment-15438506 ] binlijin commented on HBASE-16213: -- Ok. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438507#comment-15438507 ] binlijin commented on HBASE-16213: -- Ok. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438505#comment-15438505 ] binlijin commented on HBASE-16213: -- OK, i like to backport to branch-1. And other works will done in subtasks. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438481#comment-15438481 ] Anoop Sam John commented on HBASE-16213: Am +1 for pushing this to branch 1.4 as this is totally an optional feature. Pls attach a latest backport for branch-1 > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438478#comment-15438478 ] Anoop Sam John commented on HBASE-16213: Pls link the new issue of garbage for Tags compress in BufferedDecoder and the new one for avoiding code duplication. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438475#comment-15438475 ] Anoop Sam John commented on HBASE-16213: Pushed to master. You would like to get this in branch-1 also? Mind adding a Release Notes binlijin? How to enable the feature. And to mention abt the extra space it might take per block. Later we should be explaining it in detail in our book with more details. This is will be an excellent perf booster for random read work loads. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438350#comment-15438350 ] Yu Li commented on HBASE-16213: --- My late +1, good job fella [~aoxiang], and thanks all for review [~anoop.hbase] [~ram_krish] [~stack] > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437173#comment-15437173 ] Anoop Sam John commented on HBASE-16213: Planning to commit tomorrow my time unless I hear objections > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436653#comment-15436653 ] ramkrishna.s.vasudevan commented on HBASE-16213: I had already +1ed this. +1. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436651#comment-15436651 ] Hadoop QA commented on HBASE-16213: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 6s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 53s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 29m 58s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s {color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 92m 59s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 147m 26s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825430/HBASE-16213-master_v6.patch | | JIRA Issue | HBASE-16213 | | Optional Tests | asflicense javac javad
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436630#comment-15436630 ] Anoop Sam John commented on HBASE-16213: +1. [~ram_krish], [~saint@gmail.com] chance for one more +1? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436619#comment-15436619 ] binlijin commented on HBASE-16213: -- Upload E2E result in hfile_block_performance_E2E.pptx > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > hfile_block_performance_E2E.pptx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434967#comment-15434967 ] binlijin commented on HBASE-16213: -- Good. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, > HBASE-16213_v2.patch, cpu_blocksize_64K_valuelength_16B.png, > cpu_blocksize_64K_valuelength_256B.png, > cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, > qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434804#comment-15434804 ] Anoop Sam John commented on HBASE-16213: One thought after one more look bq.private List rowsOffset = new ArrayList(64); So we add all row offset into this List and then finally write all ints to the Hfile block's stream. Every addition to List needs an Object creation (int to Integer autoboxing) and so many garbage. We ca avoid this. Instead of List we can create a ByteArrayOutputStream (See org.apache.hadoop.hbase.io.BAOS) and write offsets in final serializing way and at the end write getBuffer() at once. The capacity of the BAOS can be initialized with 64 * 4. It will resize automatically as per the need. Also #rows can be calculated as BAOS#size()/4 WDYT? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, > HBASE-16213_v2.patch, cpu_blocksize_64K_valuelength_16B.png, > cpu_blocksize_64K_valuelength_256B.png, > cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, > qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434803#comment-15434803 ] Anoop Sam John commented on HBASE-16213: One thought after one more look bq.private List rowsOffset = new ArrayList(64); So we add all row offset into this List and then finally write all ints to the Hfile block's stream. Every addition to List needs an Object creation (int to Integer autoboxing) and so many garbage. We ca avoid this. Instead of List we can create a ByteArrayOutputStream (See org.apache.hadoop.hbase.io.BAOS) and write offsets in final serializing way and at the end write getBuffer() at once. The capacity of the BAOS can be initialized with 64 * 4. It will resize automatically as per the need. Also #rows can be calculated as BAOS#size()/4 WDYT? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, > HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, > HBASE-16213_v2.patch, cpu_blocksize_64K_valuelength_16B.png, > cpu_blocksize_64K_valuelength_256B.png, > cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, > qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434178#comment-15434178 ] Hadoop QA commented on HBASE-16213: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 2s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 39s {color} | {color:green} hbase-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 33s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestHRegion | | Timed out junit tests | org.apache.hadoop.hbase.snapshot.TestMobSecureExportSnapshot | | | org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot | | | org.apache.hadoop.hbase.snapshot.TestMobExportSnapsh
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432918#comment-15432918 ] binlijin commented on HBASE-16213: -- Sorry sir, I am on vacation this days, i will check it tomorrow. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, > cpu_blocksize_64K_valuelength_16B.png, > cpu_blocksize_64K_valuelength_256B.png, > cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, > qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432922#comment-15432922 ] binlijin commented on HBASE-16213: -- Thank you sir, I am on vacation this days, i will check it tomorrow. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, > cpu_blocksize_64K_valuelength_16B.png, > cpu_blocksize_64K_valuelength_256B.png, > cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, > qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432622#comment-15432622 ] Anoop Sam John commented on HBASE-16213: Ya in my initial comments also I was suggesting we can do in new jiras. But towards end there are lot of duplication. Would be better if the new DBE can some way extend the NoOp DBE equivalent. Let us know if u need any help. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, > cpu_blocksize_64K_valuelength_16B.png, > cpu_blocksize_64K_valuelength_256B.png, > cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, > qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432445#comment-15432445 ] ramkrishna.s.vasudevan commented on HBASE-16213: I saw comment in RB is about code refinement and about the duplicate code from Anoop. Are you planning to change it [~aoxiang]? I think initially i suggested the same and I saw some refinement you had tried to do. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, > cpu_blocksize_64K_valuelength_16B.png, > cpu_blocksize_64K_valuelength_256B.png, > cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, > qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426796#comment-15426796 ] Hadoop QA commented on HBASE-16213: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 8s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 32s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 42s {color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 42s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 11s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-18 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12824354/HBASE-16213-master_v4.patch | | JIRA Issue | HBASE-16213 | | Optional Tests | asflicense javac javado
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426218#comment-15426218 ] Hadoop QA commented on HBASE-16213: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 11s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s {color} | {color:green} master passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} master passed with JDK v1.7.0_101 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 49s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 42s {color} | {color:green} hbase-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 31s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 18s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.io.hfile.TestSeekTo | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-18 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12824312/HBASE-16213-master_v3.patch
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426130#comment-15426130 ] binlijin commented on HBASE-16213: -- Thanks very much @ram. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, > HBASE-16213-master_v3.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, > HBASE-16213_v2.patch, hfile-cpu.png, hfile_block_performance.pptx, > hfile_block_performance2.pptx, new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426127#comment-15426127 ] ramkrishna.s.vasudevan commented on HBASE-16213: I verfied the updated patch and the comments seems to be fixed. I have not checked the logic of how the bytebuffers are traversed back and forth and I believe the test cases would have caught them. May be there are lot of garbage getting generated because of lot of slices and duplicates. I think that can be seen later if really it is a problem. As of now am fine with this patch. +1. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417009#comment-15417009 ] Anoop Sam John commented on HBASE-16213: Ya go for the code refinement. Pls do V1 first which is not having any space optimization things. We can add V2 and/or V3 later once this is in. Ya this is excellent work.. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416999#comment-15416999 ] Yu Li commented on HBASE-16213: --- We have a default DBE (NONE) already right? I also agree to implement this as a kind of DBE, but IMHO it's fair to set this new implementation as the default DBE instead of NONE, and user could still explicitly use other DATA_BLOCK_ENCODING (set in HColumnDescriptor) if would like to. :-) > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416969#comment-15416969 ] Yu Li commented on HBASE-16213: --- Nice work and nice testing. Since still need to wait for our machines to be ready for further E2E perf testing (due to machine room relocation), I suggest to start code refinement according to review comments from Ram and others, and update patch on review board [~aoxiang] btw, would be great to see performance data from your side sir [~stack], thanks. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416612#comment-15416612 ] Anoop Sam John commented on HBASE-16213: bq. Make it default Am not sure. Because it is implemented as a type of DBE. That means we make one DBE as default. This will help in random get but not much in range scan. Also one more thing to note that is when this is used users can not use other DBE optimizations (space saving).. Ya that is true also.. Because all DBE impls rely on the fact that the reads are linear over an HFile block. They key and/or value of one KV can be obtained by reading all the previous cells in the block. So implementing this as a kind of DBE also correct IMO. We should get this in to 2.0. Good one. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416261#comment-15416261 ] binlijin commented on HBASE-16213: -- Yes sir, you can try alter table DATA_BLOCK_ENCODING from 'NONE' to ‘ROW_INDEX_V1' to test it. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416245#comment-15416245 ] binlijin commented on HBASE-16213: -- Yes. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416244#comment-15416244 ] stack commented on HBASE-16213: --- This makes sense to me. The latency improvements are probably so small as to go unnoticed (relative) but yeah, less savings in a bunch of cpu... I can measure too > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416234#comment-15416234 ] binlijin commented on HBASE-16213: -- I will test it when the machine is ready. The last time when i get the hfile-cpu.png, i do not see the throughput improved, but cut down much CPU usage. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416230#comment-15416230 ] binlijin commented on HBASE-16213: -- Yes sir, we can turn this feature on always when this is stable. Yes,the data is all cached. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416213#comment-15416213 ] stack commented on HBASE-16213: --- Why not turn this feature on always? If keys of 16bytes in value only add 5% to the size -- this is worst case -- the benefit far outweighs this extra size. When you say seek, you are seeking cached data? Nice numbers [~aoxiang] > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416205#comment-15416205 ] stack commented on HBASE-16213: --- I would imagine it makes a lot of difference in cpu usage when random reads; this linear seek is one of our main cpu consumers random reading. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415690#comment-15415690 ] ramkrishna.s.vasudevan commented on HBASE-16213: bq.Also this may not improve E2E throughout or latency much, but may cut down the CPU usage. I am not sure I get this. For random reads it is the seek which takes time, if that is improved then we should see some perf gain in terms of throughput or latency. Or you mean the same % will not be felt in E2E performance rather it would be much more reduced but still visible? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415217#comment-15415217 ] binlijin commented on HBASE-16213: -- Also this may not improve E2E throughout or latency much, but may cut down the CPU usage. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415202#comment-15415202 ] binlijin commented on HBASE-16213: -- Will do the E2E test later. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415197#comment-15415197 ] binlijin commented on HBASE-16213: -- Yes, many of the code should be reused, at first i make it work quickly. I do not think it have any specific test case for these new type. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415193#comment-15415193 ] binlijin commented on HBASE-16213: -- I put meta data size and overhead %(meta size/data size) in another table in the 1st page. Because it has too much columns to put in one table. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, hfile_block_performance2.pptx, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414971#comment-15414971 ] Anoop Sam John commented on HBASE-16213: To add to what I said abt test report, it would be great to note down the E2E throughout gain and/or latency reduction with this. Not just a seek related improvement. When u get a chance pls do it for the diff data size and block size and add to this ppt. Thanks man. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414885#comment-15414885 ] ramkrishna.s.vasudevan commented on HBASE-16213: Perf improvement is great. With smaller blocks and bigger value size impact is lesser as only very few rows are to be found so that seek is not taking time. The meta data overhead is at the max 4k more I think. HAving multiple columns for the same row also should go with the same meta data overhead only (if the total size is going to account to approx 1K). Went through the patch. Some of the tag related decode and encode can be moved to a subclass and avoid duplicate with the existing code I think. And see if the SeekState's Cell impl should be all together new in the new EncodedSeeker state implementation. May be they can be reused. I have not checked if there is something different so that it is not getting reused. I think all the existing tests for DBE would work with this because the new DBE enum will iterate through all. Do you need any specific test case for these new types? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414747#comment-15414747 ] Anoop Sam John commented on HBASE-16213: Thanks for the detailed test and info. The seek perf improvement is excellent. The size delta because of row offset meta data seems very small when cell size is not so small (>100 bytes I consider) We need this info so that the doc around this feature can explain it well how much block size getting increased because of this new type of DBE. Small suggestion in your ppt 1st page, just make one more column and explicitly add meta data size. Else every time read, we have to subtract 2 values. And may be add a field like overhead % also. (ie. (Total size - data size)/data size) > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > hfile_block_performance.pptx, new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400366#comment-15400366 ] binlijin commented on HBASE-16213: -- alter 'table_put_10B_100w', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOCKSIZE => '65536'} major_compact 'table_put_10B_100w' We can alter the table with different BLOCKSIZE also. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400363#comment-15400363 ] binlijin commented on HBASE-16213: -- When i do the test (1)create "table_put_10B_100w", "cf" (2) use ycsb write to table "table_put_10B_100w" 100w rows, and value=16B, BLOCKSIZE => '65536' (3)flush table 'table_put_10B_100w' (4) alter 'table_put_10B_100w',{NAME => 'cf',DATA_BLOCK_ENCODING => 'NONE'} major_compact 'table_put_10B_100w' we can do the test with DATA_BLOCK_ENCODING => 'NONE' (5) alter 'table_put_10B_100w',{NAME => 'cf',DATA_BLOCK_ENCODING => 'ROW_INDEX_V1'} major_compact 'table_put_10B_100w' we can do the test with DATA_BLOCK_ENCODING => ‘ROW_INDEX_V1' > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400208#comment-15400208 ] stack commented on HBASE-16213: --- This is hard to test [~aoxiang] or at least hard to do an apples to apples compare because I have to write files with the indexes on them first, right? And then I'd run a random read load that ignored the indices vs one that didn't. Its a config? Thanks. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385305#comment-15385305 ] stack commented on HBASE-16213: --- Those are nice numbers [~aoxiang] Let me give it a go > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385304#comment-15385304 ] stack commented on HBASE-16213: --- It is ok [~carp84] Branch-1 is good. I have a little rig w/ branch-1 so I can try it out. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385225#comment-15385225 ] Yu Li commented on HBASE-16213: --- Let's update the patch for master branch (instead of branch-1) on RB to make it easier for review fella :-) > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385212#comment-15385212 ] binlijin commented on HBASE-16213: -- Upload a cpu usage comparison. use ycsb test it, key=10B, value=16B, 100w, BLOCKSIZE => '65536' NONE32%= user 28% + system 4% ROW_INDEX_V118% = user 14% + system 4% > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, > new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385092#comment-15385092 ] binlijin commented on HBASE-16213: -- Yes, sir. I use EncodedSeekPerformanceTest do some performance with it, and already post the result. I talk it with Anoop Sam John, and he see we only need ROW_INDEX_V1. I think this feature is more useful for tables with many small cells. I think you see the patch for branch 1 and i already change it. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384547#comment-15384547 ] stack commented on HBASE-16213: --- [~aoxiang] You see above? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, new-hfile-block.xlsx > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. > I use EncodedSeekPerformanceTest test the performance. > First use ycsb write 100w data, every row have only one qualifier, and > valueLength=16B/64/256B/1k. > Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and > also record HFileBlock's dataSize/dataWithMetaSize in the encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383619#comment-15383619 ] stack commented on HBASE-16213: --- Nice. Seek in the row when random reading is one of the main consumers of CPU. Why bother having two encoders? Why not just one that does row and column family index? Any idea on how much more work we are doing when this is enabled (CPU?). Is it less with this feature on or more? Under what circumstances do you think? Let me try this. Meantime here are some comments on the patch: In class comment, either in encoder or decoder, describe how the encoding works, what layout looks like with some advice on when to use it. Can then copy paste as the release note on this issue. For... builder.write(cell); Could the above return a length so you don't have to reget it on the next line with: int size = KeyValueUtil.length(cell); The length parse costs. Anywhere that you can get count of how many kvs in block that you can use here: List kvs = new ArrayList(); Remove these... // TODO Auto-generated method stub Put these together? 102 LOG.trace("RowNumber: " + rowsOffset.size()); 103 LOG.trace("onDiskSize: " + onDiskSize); One line is easier to read than two... Got half way through... will be back w/ more. Nice. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature > Components: Performance >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378783#comment-15378783 ] binlijin commented on HBASE-16213: -- ROW_INDEX_V2 will store column family only once in a HFileBlock. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378782#comment-15378782 ] binlijin commented on HBASE-16213: -- [~tedyu] org.apache.hadoop.hbase.io.encoding.TestEncodedSeekers org.apache.hadoop.hbase.io.encoding.TestChangingEncoding org.apache.hadoop.hbase.io.encoding.TestDataBlockEncoders org.apache.hadoop.hbase.io.encoding.TestSeekToBlockWithEncoders The four unit tests already test the ROW_INDEX_V1. For larger key/values(key=10B, value=1k), there is 10% improvements. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378779#comment-15378779 ] binlijin commented on HBASE-16213: -- ROW_INDEX_V2 store column family only once, so the overhead will do not have so much, but current do not have this version for master. And the third version will store every row only once, and this version do not have implement now. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378053#comment-15378053 ] Anoop Sam John commented on HBASE-16213: May be u can consider key of 50 bytes size and value of 256 bytes. In such case #rows per HFileBlock will be lesser and so the gain may be less. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377886#comment-15377886 ] Ted Yu commented on HBASE-16213: Lijin: Can you add unit tests for ROW_INDEX_V1 ? In the comparison, both key and value are 10B in size. Can you perform comparison on larger key / values ? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376833#comment-15376833 ] Anoop Sam John commented on HBASE-16213: Thanks. So when we have per row size of say 256 bytes, we will have a 1 KB overhead of storing these offsets (considering 64 KB block size) So in places like calc bucket cache bucket sizes, user will have to consider this meta data overhead also. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374627#comment-15374627 ] binlijin commented on HBASE-16213: -- The first version do not have any storage optimization, the perf comparison is the comparison between the new algorithm with NO encoding. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374626#comment-15374626 ] binlijin commented on HBASE-16213: -- The first version do not have any storage optimization, the perf comparison is the comparison between the new algorithm with NO encoding. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374617#comment-15374617 ] binlijin commented on HBASE-16213: -- Patch for master on RB https://reviews.apache.org/r/49980/ > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374614#comment-15374614 ] ramkrishna.s.vasudevan commented on HBASE-16213: So this is basically a new encoder algorithm added. So in the perf comparison the comparison was done between the new algorithm with NO encoding or any of the existing Encoding algorithm ? > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, > HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372779#comment-15372779 ] binlijin commented on HBASE-16213: -- Current the patch base on branch-1 > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213.patch, HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372697#comment-15372697 ] Anoop Sam John commented on HBASE-16213: Interesting.. Seem the attached patch is not based on master code base. Can you attach one based on master? RB will be better.. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213.patch, HBASE-16213_v2.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372419#comment-15372419 ] binlijin commented on HBASE-16213: -- Perf Test Write 1000W rows to table_put_10B_1000w, key=10B length, value=10B length, 64K hfileblock size. Use ycsb to test the performance. Only one regionserver, one client(and one thread), and run client on the regionserver's machine, result is: Throughput(ops/sec), 9034 NONE Throughput(ops/sec), 12817 ROW_INDEX_V1 > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get
[ https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372193#comment-15372193 ] binlijin commented on HBASE-16213: -- ROW_INDEX_V1 is the first version. ROW_INDEX_V2 store column family only once. > A new HFileBlock structure for fast random get > -- > > Key: HBASE-16213 > URL: https://issues.apache.org/jira/browse/HBASE-16213 > Project: HBase > Issue Type: New Feature >Reporter: binlijin > Attachments: HBASE-16213.patch > > > HFileBlock store cells sequential, current when to get a row from the block, > it scan from the first cell until the row's cell. > The new structure store every row's start offset with data, so it can find > the exact row with binarySearch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)