[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2019-10-02 Thread Jean-Marc Spaggiari (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943237#comment-16943237
 ] 

Jean-Marc Spaggiari commented on HBASE-16213:
-

Is there any follow-up JIRA fot V2 and V3?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: Lijin Bin
>Assignee: Lijin Bin
>Priority: Major
> Fix For: 1.4.0, 2.0.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2019-09-22 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1693#comment-1693
 ] 

stack commented on HBASE-16213:
---

HBASE-23055 will allow being able to set DataBlockEncoding.ROW_INDEX_V1 on 
hbase:meta.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
>Priority: Major
> Fix For: 1.4.0, 2.0.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2018-01-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319822#comment-16319822
 ] 

stack commented on HBASE-16213:
---

bq. Not actually but yes it worth a try boss stack

I'll report back my findings. Thanks [~carp84]

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2018-01-09 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319613#comment-16319613
 ] 

Yu Li commented on HBASE-16213:
---

bq. Did you fellows deploy this on hbase:meta?
Not actually but yes it worth a try boss [~stack]

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2018-01-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318828#comment-16318828
 ] 

stack commented on HBASE-16213:
---

[~carp84] + [~aoxiang] Did you fellows deploy this on hbase:meta? Thanks.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2017-08-27 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143316#comment-16143316
 ] 

Yu Li commented on HBASE-16213:
---

bq. I think V2 can replace V1, V2 have space optimize and the same seek perf.
Let's get HBASE-16594 in then if no objections ([~anoop.hbase] please let us 
know if any comments here sir, thanks), then people like our mighty 
[~yangzhe1991] could use it and further improve it (smile).

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213.branch-1.v1.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx, hfile_block_performance.pptx, hfile-cpu.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-12-29 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785059#comment-15785059
 ] 

binlijin commented on HBASE-16213:
--

Looks like a good idea, i do not think it deeply..

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-12-29 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785035#comment-15785035
 ] 

Phil Yang commented on HBASE-16213:
---

bq. because it is need to record how many versions there are.
We have to read Cells one by one within a row ? If so it is no need to know how 
many versions we have and we just use the previous qualifier if the current 
Cell has no qualifier?  Correct me if I am wrong. Thanks.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-12-29 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785001#comment-15785001
 ] 

binlijin commented on HBASE-16213:
--

Yes, must in one family.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-12-29 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15785000#comment-15785000
 ] 

binlijin commented on HBASE-16213:
--

I think V2 can replace V1, V2 have space optimize and the same seek perf.
create several kinds of structures? I think it is not a problem.
And with the idea of V2 we can also save qualifier only once for versions of 
cell, right? I think it is hard to see, because it is need to record how many 
versions there are.


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-12-29 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784990#comment-15784990
 ] 

Phil Yang commented on HBASE-16213:
---

And for an HFile all cells are must in one family? Am I miss something? Thanks.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-12-29 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784969#comment-15784969
 ] 

Phil Yang commented on HBASE-16213:
---

Hi [~aoxiang] 
Thanks for your important work.

Does V2 have any weakness comparing with V1? According to their formats it 
seems that V2 only has advantage? :) 1.4 or 2.0 have not been released so can 
we just improve this structure rather than create several kinds of structures? 

And with the idea of V2 we can also save qualifier only once for versions of 
cell, right? 

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-31 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451820#comment-15451820
 ] 

Yu Li commented on HBASE-16213:
---

Ok, makes sense, thanks for the quick response sir [~mantonov] :-)

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-31 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451649#comment-15451649
 ] 

Mikhail Antonov commented on HBASE-16213:
-

[~carp84] this is a nice work for sure, I just think it's a bit too late for 
new features like this one to go to 1.3. Let's keep it in 1.4 (branch-1) for 
now, then depending on more prod testing and numbers on how much we can win we 
can cherry-pick for 1.3.1..

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-30 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451208#comment-15451208
 ] 

Yu Li commented on HBASE-16213:
---

Maybe a little bit late but I'm wondering whether this is a good one for 1.3? 
It's a good but add-on feature which has few changes to core codes. [~mantonov]

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447506#comment-15447506
 ] 

Hudson commented on HBASE-16213:


FAILURE: Integrated in Jenkins build HBase-1.4 #378 (See 
[https://builds.apache.org/job/HBase-1.4/378/])
HBASE-16213 A new HFileBlock structure for fast random get. (binlijin) 
(anoopsamjohn: rev c899897bc8dc4a7eccc9e2a80fd05ad55654f18e)
* (edit) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestSeekToBlockWithEncoders.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteArrayOutputStream.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexSeekerV1.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexCodecV1.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexEncoderV1.java


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-28 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444982#comment-15444982
 ] 

Anoop Sam John commented on HBASE-16213:


Oh ya.. 

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-28 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444949#comment-15444949
 ] 

binlijin commented on HBASE-16213:
--

 All the existing tests for DBE would work with this because the new DBE enum 
will iterate through all.So there is no new test for the new DBE.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-28 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444924#comment-15444924
 ] 

binlijin commented on HBASE-16213:
--

Because the rows are not ascending order. For example kv5 is sort order before 
kv4. So the RowIndexEncoderV1#checkRow will throw IOException, and the master 
already changed also.

{code}
  /**
   * Test seeking while file is encoded.
   */
  @Test
  public void testSeekingToBlockWithBiggerNonLength1() throws IOException {
List sampleKv = new ArrayList();
KeyValue kv1 = new KeyValue(Bytes.toBytes("aaa"), Bytes.toBytes("f1"), 
Bytes.toBytes("q1"),
Bytes.toBytes("val"));
sampleKv.add(kv1);
KeyValue kv2 = new KeyValue(Bytes.toBytes("aab"), Bytes.toBytes("f1"), 
Bytes.toBytes("q1"),
Bytes.toBytes("val"));
sampleKv.add(kv2);
KeyValue kv3 = new KeyValue(Bytes.toBytes("aac"), Bytes.toBytes("f1"), 
Bytes.toBytes("q1"),
Bytes.toBytes("val"));
sampleKv.add(kv3);
KeyValue kv4 = new KeyValue(Bytes.toBytes("aad"), Bytes.toBytes("f1"), 
Bytes.toBytes("q1"),
Bytes.toBytes("val"));
sampleKv.add(kv4);
KeyValue kv5 = new KeyValue(Bytes.toBytes("d"), Bytes.toBytes("f1"), 
Bytes.toBytes("q1"),
Bytes.toBytes("val"));
sampleKv.add(kv5);
KeyValue toSeek = new KeyValue(Bytes.toBytes(""), Bytes.toBytes("f1"), 
Bytes.toBytes("q1"),
Bytes.toBytes("val"));
seekToTheKey(kv1, sampleKv, toSeek);
  }
{code}


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-28 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15444846#comment-15444846
 ] 

Anoop Sam John commented on HBASE-16213:


Why the KV changes in TestSeekToBlockWithEncoders?   Tests for the new DBE, not 
copied to branch-1 patch?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.branch-1.v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439351#comment-15439351
 ] 

Hadoop QA commented on HBASE-16213:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
55s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 8s 
{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
15m 49s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 43s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 23s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 121m 33s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.mapred.TestMultiTableSnapshotInputFormat |
|   | hadoop.hbase.replication.regionserver.TestReplicationSourceManager |
|   | hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat |
| T

[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-26 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439118#comment-15439118
 ] 

binlijin commented on HBASE-16213:
--

OK, run the four testcase locally, all passed.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, 
> hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-26 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438858#comment-15438858
 ] 

Anoop Sam John commented on HBASE-16213:


Test failures seems to be not related to this patch..  Can u confirm once pls. 

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, 
> hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438855#comment-15438855
 ] 

Hadoop QA commented on HBASE-16213:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
2s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
52s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} branch-1 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 13s 
{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
17m 50s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 44s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 125m 26s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestHRegion |
|   | hadoop.hbase.regionserver.TestClusterId |
|   | hadoop.hbase.security.token.TestZKSecretWatcher |
|   | hadoop.hbase.mapreduce.TestMultiTableSnapshotInp

[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-26 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438753#comment-15438753
 ] 

binlijin commented on HBASE-16213:
--

Ok, rename to branch-1.v4 and upload.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.branch-1.v4.patch, 
> HBASE-16213.patch, HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, 
> hfile-cpu.png, hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-26 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438746#comment-15438746
 ] 

Yu Li commented on HBASE-16213:
---

Please rename the patch to "branch-1.v3.patch" to trigger the UT against 
correct branch [~aoxiang]

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438720#comment-15438720
 ] 

Hadoop QA commented on HBASE-16213:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s {color} 
| {color:red} HBASE-16213 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825641/HBASE-16213_branch1_v3.patch
 |
| JIRA Issue | HBASE-16213 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/3286/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, 
> HBASE-16213.branch-1.v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438513#comment-15438513
 ] 

Hudson commented on HBASE-16213:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #1483 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1483/])
HBASE-16213 A new HFileBlock structure for fast random get. (binlijin) 
(anoopsamjohn: rev 0d99e827b22f1aedb8595a5d8b76a6085dc5a654)
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexCodecV1.java
* (edit) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexSeekerV1.java
* (delete) 
hbase-server/src/main/java/org/apache/hadoop/hbase/SizeCachedKeyValue.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestSeekToBlockWithEncoders.java
* (delete) 
hbase-server/src/main/java/org/apache/hadoop/hbase/SizeCachedNoTagsKeyValue.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/SizeCachedNoTagsKeyValue.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/SizeCachedKeyValue.java
* (add) 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/RowIndexEncoderV1.java


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438506#comment-15438506
 ] 

binlijin commented on HBASE-16213:
--

Ok.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438507#comment-15438507
 ] 

binlijin commented on HBASE-16213:
--

Ok.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438505#comment-15438505
 ] 

binlijin commented on HBASE-16213:
--

OK, i like to backport to branch-1. 
And other works will done in subtasks.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438481#comment-15438481
 ] 

Anoop Sam John commented on HBASE-16213:


Am +1 for pushing this to branch 1.4 as this is totally an optional feature. 
Pls attach a latest backport for branch-1

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438478#comment-15438478
 ] 

Anoop Sam John commented on HBASE-16213:


Pls link the new issue of garbage for Tags compress in BufferedDecoder and the 
new one for avoiding code duplication.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438475#comment-15438475
 ] 

Anoop Sam John commented on HBASE-16213:


Pushed to master.  You would like to get this in branch-1 also?  Mind adding a 
Release Notes binlijin?  How to enable the feature.  And to mention abt the 
extra space it might take per block.  Later we should be explaining it in 
detail in our book with more details.  This is will be an excellent perf 
booster for random read work loads.  

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438350#comment-15438350
 ] 

Yu Li commented on HBASE-16213:
---

My late +1, good job fella [~aoxiang], and thanks all for review [~anoop.hbase] 
[~ram_krish] [~stack]

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437173#comment-15437173
 ] 

Anoop Sam John commented on HBASE-16213:


Planning to commit tomorrow my time unless I hear objections

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436653#comment-15436653
 ] 

ramkrishna.s.vasudevan commented on HBASE-16213:


I had already +1ed this. +1.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436651#comment-15436651
 ] 

Hadoop QA commented on HBASE-16213:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
53s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
29m 58s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 92m 59s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 147m 26s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12825430/HBASE-16213-master_v6.patch
 |
| JIRA Issue | HBASE-16213 |
| Optional Tests |  asflicense  javac  javad

[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436630#comment-15436630
 ] 

Anoop Sam John commented on HBASE-16213:


+1. [~ram_krish], [~saint@gmail.com]  chance for one more +1?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-25 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436619#comment-15436619
 ] 

binlijin commented on HBASE-16213:
--

Upload E2E result in hfile_block_performance_E2E.pptx

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213-master_v6.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> hfile_block_performance_E2E.pptx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-24 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434967#comment-15434967
 ] 

binlijin commented on HBASE-16213:
--

Good.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, 
> HBASE-16213_v2.patch, cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-24 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434804#comment-15434804
 ] 

Anoop Sam John commented on HBASE-16213:


One thought after one more look
bq.private List rowsOffset = new ArrayList(64);
So we add all row offset into this List and then finally write all ints to the 
Hfile block's stream.  Every addition to List needs an Object creation (int to 
Integer autoboxing) and so many garbage.  We ca avoid this.
Instead of List we can create a ByteArrayOutputStream (See 
org.apache.hadoop.hbase.io.BAOS)  and write offsets in final serializing way 
and at the end   write getBuffer() at once.  The capacity of the BAOS can be 
initialized with 64 * 4. It will resize automatically as per the need.  Also 
#rows can be calculated as BAOS#size()/4
WDYT?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, 
> HBASE-16213_v2.patch, cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-24 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434803#comment-15434803
 ] 

Anoop Sam John commented on HBASE-16213:


One thought after one more look
bq.private List rowsOffset = new ArrayList(64);
So we add all row offset into this List and then finally write all ints to the 
Hfile block's stream.  Every addition to List needs an Object creation (int to 
Integer autoboxing) and so many garbage.  We ca avoid this.
Instead of List we can create a ByteArrayOutputStream (See 
org.apache.hadoop.hbase.io.BAOS)  and write offsets in final serializing way 
and at the end   write getBuffer() at once.  The capacity of the BAOS can be 
initialized with 64 * 4. It will resize automatically as per the need.  Also 
#rows can be calculated as BAOS#size()/4
WDYT?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, 
> HBASE-16213_v2.patch, cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434178#comment-15434178
 ] 

Hadoop QA commented on HBASE-16213:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
2s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 39s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 33s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
39s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestHRegion |
| Timed out junit tests | 
org.apache.hadoop.hbase.snapshot.TestMobSecureExportSnapshot |
|   | org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot |
|   | org.apache.hadoop.hbase.snapshot.TestMobExportSnapsh

[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-23 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432918#comment-15432918
 ] 

binlijin commented on HBASE-16213:
--

Sorry sir, I am on vacation this days, i will check it tomorrow.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, 
> cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-23 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432922#comment-15432922
 ] 

binlijin commented on HBASE-16213:
--

Thank you sir,  I am on vacation this days,  i will check it tomorrow.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, 
> cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-23 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432622#comment-15432622
 ] 

Anoop Sam John commented on HBASE-16213:


Ya in my initial comments also I was suggesting we can do in new jiras.  But 
towards end there are lot of duplication. Would be better if the new DBE can 
some way extend the NoOp DBE equivalent.  Let us know if u need any help.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, 
> cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-23 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432445#comment-15432445
 ] 

ramkrishna.s.vasudevan commented on HBASE-16213:


I saw comment in RB is about code refinement and about the duplicate code from 
Anoop.
Are you planning to change it [~aoxiang]? I think initially i suggested the 
same and I saw some refinement you had tried to do.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, 
> cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426796#comment-15426796
 ] 

Hadoop QA commented on HBASE-16213:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
8s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 32s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 42s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 42s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
35s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 145m 11s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-18 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824354/HBASE-16213-master_v4.patch
 |
| JIRA Issue | HBASE-16213 |
| Optional Tests |  asflicense  javac  javado

[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426218#comment-15426218
 ] 

Hadoop QA commented on HBASE-16213:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
11s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} master passed with JDK v1.7.0_101 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 49s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 42s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 31s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 18s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.io.hfile.TestSeekTo |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-18 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12824312/HBASE-16213-master_v3.patch

[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-18 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426130#comment-15426130
 ] 

binlijin commented on HBASE-16213:
--

Thanks very much @ram.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, 
> HBASE-16213_v2.patch, hfile-cpu.png, hfile_block_performance.pptx, 
> hfile_block_performance2.pptx, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-18 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426127#comment-15426127
 ] 

ramkrishna.s.vasudevan commented on HBASE-16213:


I verfied the updated patch and the comments seems to be fixed. I have not 
checked the logic of how the bytebuffers are traversed back and forth and I 
believe the test cases would have caught them. 
May be there are lot of garbage getting generated because of lot of slices and 
duplicates. I think that can be seen later if really it is a problem. As of now 
am fine with this patch. +1.


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-11 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417009#comment-15417009
 ] 

Anoop Sam John commented on HBASE-16213:


Ya go for the code refinement.  Pls do V1 first which is not having any space 
optimization things.  We can add V2 and/or V3 later once this is in.  Ya this 
is excellent work.. 

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-11 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416999#comment-15416999
 ] 

Yu Li commented on HBASE-16213:
---

We have a default DBE (NONE) already right? I also agree to implement this as a 
kind of DBE, but IMHO it's fair to set this new implementation as the default 
DBE instead of NONE, and user could still explicitly use other 
DATA_BLOCK_ENCODING (set in HColumnDescriptor) if would like to. :-)

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-11 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416969#comment-15416969
 ] 

Yu Li commented on HBASE-16213:
---

Nice work and nice testing. Since still need to wait for our machines to be 
ready for further E2E perf testing (due to machine room relocation), I suggest 
to start code refinement according to review comments from Ram and others, and 
update patch on review board [~aoxiang]

btw, would be great to see performance data from your side sir [~stack], thanks.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416612#comment-15416612
 ] 

Anoop Sam John commented on HBASE-16213:


bq. Make it default
Am not sure.  Because it is implemented as a type of DBE.  That means we make 
one DBE as default.  This will help in random get but not much in range scan. 
Also one more thing to note that is when this is used users can not use other 
DBE optimizations (space saving)..  Ya that is true also.. Because all DBE 
impls rely on the fact that the reads are linear over an HFile block.  They key 
and/or value of one KV can be obtained by reading all the previous cells in the 
block.   So implementing this as a kind of DBE also correct IMO.

We should get this in to 2.0.  Good one.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416261#comment-15416261
 ] 

binlijin commented on HBASE-16213:
--

Yes sir, you can try alter table DATA_BLOCK_ENCODING from 'NONE' to 
‘ROW_INDEX_V1' to test it.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416245#comment-15416245
 ] 

binlijin commented on HBASE-16213:
--

Yes.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416244#comment-15416244
 ] 

stack commented on HBASE-16213:
---

This makes sense to me. The latency improvements are probably so small as to go 
unnoticed (relative) but yeah, less savings in a bunch of cpu... I can measure 
too

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416234#comment-15416234
 ] 

binlijin commented on HBASE-16213:
--

I will test it when the machine is ready.
The last time when i get the hfile-cpu.png, i do not see the throughput 
improved, but cut down much CPU usage.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416230#comment-15416230
 ] 

binlijin commented on HBASE-16213:
--

Yes sir,  we can turn this feature on always when this is stable.
Yes,the data is all cached.


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416213#comment-15416213
 ] 

stack commented on HBASE-16213:
---

Why not turn this feature on always? If keys of 16bytes in value only add 5% to 
the size -- this is worst case -- the benefit far outweighs this extra size.

When you say seek, you are seeking cached data?

Nice numbers [~aoxiang]

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416205#comment-15416205
 ] 

stack commented on HBASE-16213:
---

I would imagine it makes a lot of difference in cpu usage when random reads; 
this linear seek is one of our main cpu consumers random reading.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415690#comment-15415690
 ] 

ramkrishna.s.vasudevan commented on HBASE-16213:


bq.Also this may not improve E2E throughout or latency much, but may cut down 
the CPU usage.
I am not sure I get this. For random reads it is the seek which takes time, if 
that is improved then we should see some perf gain in terms of throughput or 
latency. 
Or you mean the same % will not be felt in E2E performance rather it would be 
much more reduced but still visible?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415217#comment-15415217
 ] 

binlijin commented on HBASE-16213:
--

Also this may not improve E2E throughout or latency much, but may cut down the 
CPU usage.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415202#comment-15415202
 ] 

binlijin commented on HBASE-16213:
--

Will do the E2E test later.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415197#comment-15415197
 ] 

binlijin commented on HBASE-16213:
--

Yes, many of the code should be reused, at first i make it work quickly.
I do not think it have any specific test case for these new type.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415193#comment-15415193
 ] 

binlijin commented on HBASE-16213:
--

I put meta data size  and overhead %(meta size/data size) in another table in 
the 1st page. Because it has too much columns to put in one table.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414971#comment-15414971
 ] 

Anoop Sam John commented on HBASE-16213:


To add to what I said abt test report, it would be great to note down the E2E 
throughout gain and/or latency reduction with this.  Not just a seek related 
improvement.  When u get a chance pls do it for the diff data size and block 
size and add to this ppt.  Thanks man.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-10 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414885#comment-15414885
 ] 

ramkrishna.s.vasudevan commented on HBASE-16213:


Perf improvement is great. With smaller blocks and bigger value size impact is 
lesser as only very few rows are to be found so that seek is not taking time. 
The meta data overhead is at the max 4k more I think. 
HAving multiple columns for the same row also should go with the same meta data 
overhead only (if the total size is going to account to approx 1K).
Went through the patch. 
Some of the tag related decode and encode can be moved to a subclass and avoid 
duplicate with the existing code I think.
And see if the SeekState's Cell impl should be all together new in the new 
EncodedSeeker state implementation. May be they can be reused. I have not 
checked if there is something different so that it is not getting reused.
I think all the existing tests for DBE would work with this because the new DBE 
enum will iterate through all. Do you need any specific test case for these new 
types?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-08-09 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414747#comment-15414747
 ] 

Anoop Sam John commented on HBASE-16213:


Thanks for the detailed test and info.   The seek perf improvement is 
excellent.  The size delta because of row offset meta data seems very small 
when cell size is not so small (>100 bytes I consider)  We need this info so 
that the doc around this feature can explain it well how much block size 
getting increased because of this new type of DBE.  Small suggestion in your 
ppt 1st page, just make one more column and explicitly add meta data size.  
Else every time read, we have to subtract 2 values.  And may be add a field 
like overhead % also.  (ie.   (Total size - data size)/data size)

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-29 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400366#comment-15400366
 ] 

binlijin commented on HBASE-16213:
--

alter 'table_put_10B_100w', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', 
BLOCKSIZE => '65536'}
major_compact 'table_put_10B_100w'
We can alter the table with different BLOCKSIZE also.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-29 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400363#comment-15400363
 ] 

binlijin commented on HBASE-16213:
--

When i do the test
(1)create "table_put_10B_100w", "cf"
(2)  use ycsb write to table  "table_put_10B_100w" 100w rows, and value=16B,  
BLOCKSIZE => '65536'
(3)flush table 'table_put_10B_100w' 

  (4) alter 'table_put_10B_100w',{NAME => 'cf',DATA_BLOCK_ENCODING => 'NONE'}   
major_compact 'table_put_10B_100w'
we can do the test with DATA_BLOCK_ENCODING => 'NONE'
  (5)  alter 'table_put_10B_100w',{NAME => 'cf',DATA_BLOCK_ENCODING => 
'ROW_INDEX_V1'}   major_compact 'table_put_10B_100w'
   we can do the test with DATA_BLOCK_ENCODING => ‘ROW_INDEX_V1'


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400208#comment-15400208
 ] 

stack commented on HBASE-16213:
---

This is hard to test [~aoxiang] or at least hard to do an apples to apples 
compare because I have to write files with the indexes on them first, right? 
And then I'd run a random read load that ignored the indices vs one that 
didn't. Its a config? Thanks.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385305#comment-15385305
 ] 

stack commented on HBASE-16213:
---

Those are nice numbers [~aoxiang] Let me give it a go

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385304#comment-15385304
 ] 

stack commented on HBASE-16213:
---

It is ok [~carp84] Branch-1 is good. I have a little rig w/ branch-1 so I can 
try it out.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-19 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385225#comment-15385225
 ] 

Yu Li commented on HBASE-16213:
---

Let's update the patch for master branch (instead of branch-1) on RB to make it 
easier for review fella :-)

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-19 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385212#comment-15385212
 ] 

binlijin commented on HBASE-16213:
--

Upload a cpu usage comparison.
use ycsb test it, key=10B, value=16B, 100w, BLOCKSIZE => '65536'
NONE32%= user 28% + system 4%
ROW_INDEX_V118% = user 14% + system 4%

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-19 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385092#comment-15385092
 ] 

binlijin commented on HBASE-16213:
--

Yes, sir.
I use EncodedSeekPerformanceTest do some performance with it, and already post 
the result.

I talk it with Anoop Sam John, and he see we only need ROW_INDEX_V1.

I think this feature is more useful for tables with many small cells.

I think you see the patch for branch 1 and i already change it.


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384547#comment-15384547
 ] 

stack commented on HBASE-16213:
---

[~aoxiang] You see above?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383619#comment-15383619
 ] 

stack commented on HBASE-16213:
---

Nice. Seek in the row when random reading is one of the main consumers of CPU.

Why bother having two encoders? Why not just one that does row and column 
family index?

Any idea on how much more work we are doing when this is enabled (CPU?). Is it 
less with this feature on or more? Under what circumstances do you think?

Let me try this.  Meantime here are some comments on the patch:

In class comment, either in encoder or decoder, describe how the encoding 
works, what layout looks like with some advice on when to use it. Can then copy 
paste as the release note on this issue.

For...

builder.write(cell);

Could the above return a length so you don't have to reget it on the next line 
with:

int size = KeyValueUtil.length(cell);

The length parse costs.

Anywhere that you can get count of how many kvs in block that you can use here:

  List kvs = new ArrayList();

Remove these...

// TODO Auto-generated method stub

Put these together?


102   LOG.trace("RowNumber: " + rowsOffset.size());
103   LOG.trace("onDiskSize: " + onDiskSize);

One line is easier to read than two...

Got half way through... will be back w/ more. Nice.











> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-14 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378783#comment-15378783
 ] 

binlijin commented on HBASE-16213:
--

ROW_INDEX_V2 will store column family only once in a HFileBlock.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-14 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378782#comment-15378782
 ] 

binlijin commented on HBASE-16213:
--

[~tedyu]
org.apache.hadoop.hbase.io.encoding.TestEncodedSeekers
org.apache.hadoop.hbase.io.encoding.TestChangingEncoding
org.apache.hadoop.hbase.io.encoding.TestDataBlockEncoders
org.apache.hadoop.hbase.io.encoding.TestSeekToBlockWithEncoders
The four unit tests already test the ROW_INDEX_V1.
For larger key/values(key=10B, value=1k), there is 10% improvements. 


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-14 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378779#comment-15378779
 ] 

binlijin commented on HBASE-16213:
--

ROW_INDEX_V2 store column family only once, so the overhead will do not have so 
much, but current do not have this version for master.
And the third version will store every row only once, and this version do not 
have implement now.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378053#comment-15378053
 ] 

Anoop Sam John commented on HBASE-16213:


May be u can consider key of 50 bytes size and value of 256 bytes.  In such 
case #rows per HFileBlock will be lesser and so the gain may be less.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377886#comment-15377886
 ] 

Ted Yu commented on HBASE-16213:


Lijin:
Can you add unit tests for ROW_INDEX_V1 ?

In the comparison, both key and value are 10B in size.
Can you perform comparison on larger key / values ?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376833#comment-15376833
 ] 

Anoop Sam John commented on HBASE-16213:


Thanks.
So when we have per row size of say 256 bytes, we will have a 1 KB overhead of 
storing these offsets (considering 64 KB block size)
So in places like calc bucket cache bucket sizes, user will have to consider 
this meta data overhead also.


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-13 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374627#comment-15374627
 ] 

binlijin commented on HBASE-16213:
--

The first version do not have any storage optimization, the perf comparison is 
the comparison between the new algorithm with NO encoding.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-13 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374626#comment-15374626
 ] 

binlijin commented on HBASE-16213:
--

The first version do not have any storage optimization, the perf comparison is 
the comparison between the new algorithm with NO encoding.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-13 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374617#comment-15374617
 ] 

binlijin commented on HBASE-16213:
--

Patch for master on RB https://reviews.apache.org/r/49980/

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374614#comment-15374614
 ] 

ramkrishna.s.vasudevan commented on HBASE-16213:


So this is basically a new encoder algorithm added. So in the perf comparison 
the comparison was done between the new algorithm with NO encoding or any of 
the existing Encoding algorithm ?

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-12 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372779#comment-15372779
 ] 

binlijin commented on HBASE-16213:
--

Current the patch base on branch-1

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213.patch, HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-12 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372697#comment-15372697
 ] 

Anoop Sam John commented on HBASE-16213:


Interesting..  Seem the attached patch is not based on master code base. Can 
you attach one based on master? RB will be better..

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213.patch, HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-12 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372419#comment-15372419
 ] 

binlijin commented on HBASE-16213:
--

Perf Test
Write 1000W rows to table_put_10B_1000w,  key=10B length, value=10B length, 64K 
hfileblock size.
Use ycsb to test the performance.
Only one regionserver, one client(and one thread), and run client on the 
regionserver's machine, result is:
Throughput(ops/sec), 9034 NONE
Throughput(ops/sec), 12817   ROW_INDEX_V1


> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16213) A new HFileBlock structure for fast random get

2016-07-11 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372193#comment-15372193
 ] 

binlijin commented on HBASE-16213:
--

ROW_INDEX_V1 is the first version.
ROW_INDEX_V2 store column family only once.

> A new HFileBlock structure for fast random get
> --
>
> Key: HBASE-16213
> URL: https://issues.apache.org/jira/browse/HBASE-16213
> Project: HBase
>  Issue Type: New Feature
>Reporter: binlijin
> Attachments: HBASE-16213.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)