[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2016-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15692392#comment-15692392
 ] 

stack commented on HBASE-10676:
---

HBASE-17072 actually includes the AtomicReference replacement for ThreadLocal 
part of this patch. Thanks.

> Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
> perforamce of scan
> 
>
> Key: HBASE-10676
> URL: https://issues.apache.org/jira/browse/HBASE-10676
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.99.0
>Reporter: zhaojianbo
>Assignee: zhaojianbo
> Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
> HBASE-10676-0.98-branchV2.patch
>
>
> PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
> backward seek operation as the comment said:
> {quote}
> we will not incur a backward seek operation if we have already read this 
> block's header as part of the previous read's look-ahead. And we also want to 
> skip reading the header again if it has already been read.
> {quote}
> But that is not the case. In the code of 0.98, prefetchedHeader is 
> threadlocal for one storefile reader, and in the RegionScanner 
> lifecycle,different rpc handlers will serve scan requests of the same 
> scanner. Even though one handler of previous scan call prefetched the next 
> block header, the other handlers of current scan call will still trigger a 
> backward seek operation. The process is like this:
> # rs handler1 serves the scan call, reads block1 and prefetches the header of 
> block2
> # rs handler2 serves the same scanner's next scan call, because rs handler2 
> doesn't know the header of block2 already prefetched by rs handler1, triggers 
> a backward seek and reads block2, and prefetches the header of block3.
> It is not the sequential read. So I think that the threadlocal is useless, 
> and should be abandoned. I did the work, and evaluated the performance of one 
> client, two client and four client scanning the same region with one 
> storefile.  The test environment is
> # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
> machine
> # A hbase cluster with a zk, a master, a regionserver in the same machine
> # clients are also in the same machine.
> So all the data is local. The storefile is about 22.7GB from our online data, 
> 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
> With the improvement, the client total scan time decreases 21% for the one 
> client case, 11% for the two clients case. But the four clients case is 
> almost the same. The details tests' data is the following:
> ||case||client||time(ms)||
> | original | 1 | 306222 |
> | new | 1 | 241313 |
> | original | 2 | 416390 |
> | new | 2 | 369064 |
> | original | 4 | 555986 |
> | new | 4 | 562152 |
> With some modification(see the comments below), the newest result is 
> ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
> |original|1|306222|new with synchronized|1|239510|new with 
> AtomicReference|1|241243|
> |original|2|416390|new with synchronized|2|365367|new with 
> AtomicReference|2|368952|
> |original|4|555986|new with synchronized|4|540642|new with 
> AtomicReference|4|545715|
> |original|8|854029|new with synchronized|8|852137|new with 
> AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2016-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15688288#comment-15688288
 ] 

Hadoop QA commented on HBASE-10676:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} HBASE-10676 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12633590/HBASE-10676-0.98-branchV2.patch
 |
| JIRA Issue | HBASE-10676 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4588/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
> perforamce of scan
> 
>
> Key: HBASE-10676
> URL: https://issues.apache.org/jira/browse/HBASE-10676
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.99.0
>Reporter: zhaojianbo
>Assignee: zhaojianbo
> Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
> HBASE-10676-0.98-branchV2.patch
>
>
> PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
> backward seek operation as the comment said:
> {quote}
> we will not incur a backward seek operation if we have already read this 
> block's header as part of the previous read's look-ahead. And we also want to 
> skip reading the header again if it has already been read.
> {quote}
> But that is not the case. In the code of 0.98, prefetchedHeader is 
> threadlocal for one storefile reader, and in the RegionScanner 
> lifecycle,different rpc handlers will serve scan requests of the same 
> scanner. Even though one handler of previous scan call prefetched the next 
> block header, the other handlers of current scan call will still trigger a 
> backward seek operation. The process is like this:
> # rs handler1 serves the scan call, reads block1 and prefetches the header of 
> block2
> # rs handler2 serves the same scanner's next scan call, because rs handler2 
> doesn't know the header of block2 already prefetched by rs handler1, triggers 
> a backward seek and reads block2, and prefetches the header of block3.
> It is not the sequential read. So I think that the threadlocal is useless, 
> and should be abandoned. I did the work, and evaluated the performance of one 
> client, two client and four client scanning the same region with one 
> storefile.  The test environment is
> # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
> machine
> # A hbase cluster with a zk, a master, a regionserver in the same machine
> # clients are also in the same machine.
> So all the data is local. The storefile is about 22.7GB from our online data, 
> 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
> With the improvement, the client total scan time decreases 21% for the one 
> client case, 11% for the two clients case. But the four clients case is 
> almost the same. The details tests' data is the following:
> ||case||client||time(ms)||
> | original | 1 | 306222 |
> | new | 1 | 241313 |
> | original | 2 | 416390 |
> | new | 2 | 369064 |
> | original | 4 | 555986 |
> | new | 4 | 562152 |
> With some modification(see the comments below), the newest result is 
> ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
> |original|1|306222|new with synchronized|1|239510|new with 
> AtomicReference|1|241243|
> |original|2|416390|new with synchronized|2|365367|new with 
> AtomicReference|2|368952|
> |original|4|555986|new with synchronized|4|540642|new with 
> AtomicReference|4|545715|
> |original|8|854029|new with synchronized|8|852137|new with 
> AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-04-15 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970355#comment-13970355
 ] 

zhaojianbo commented on HBASE-10676:


hi,  Lars Hofhansl. What do you think?

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: zhaojianbo
Assignee: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-30 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954930#comment-13954930
 ] 

zhaojianbo commented on HBASE-10676:


{quote}
The AtomicReference version looks better. And why the performance improvement 
didn't obvious when 4/8 clients?
{quote}
The purpose of the patch is that make reading storefile sequentially to the 
best when less client scan the same storefile. But 4/8 clients case is hard to 
make reading storefile sequentially. So it's the same to the original version. 

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: zhaojianbo
Assignee: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-30 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954941#comment-13954941
 ] 

haosdent commented on HBASE-10676:
--

[~zhaojianbo] Thank you for your explain.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: zhaojianbo
Assignee: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-29 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954338#comment-13954338
 ] 

haosdent commented on HBASE-10676:
--

 The AtomicReference version looks better. And why the performance improvement 
didn't obvious when 4/8 clients?

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: zhaojianbo
Assignee: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-28 Thread hongliang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950565#comment-13950565
 ] 

hongliang commented on HBASE-10676:
---

it's great

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: zhaojianbo
Assignee: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948164#comment-13948164
 ] 

stack commented on HBASE-10676:
---

Patch lgtm.  [~lhofhansl] What you think boss?

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: zhaojianbo
Assignee: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-18 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939108#comment-13939108
 ] 

zhaojianbo commented on HBASE-10676:


which patch should be remained.
ping to review...,  :-)

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-09 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925190#comment-13925190
 ] 

zhaojianbo commented on HBASE-10676:


hi, all
  I think I find the reason. In the new code, scan call of a scanner will still 
read the header again even though it already read it at last scan call, because 
the scan call of other scanner dirtied the prefetchedHeader. And the little 
difference makes the time of HDFS' read increase. So I adjust the patch, 
introduce a global prefetchedHeader which avoids the backward seek between the 
different RPC handlers for one scanner, and keep the prefetchedHeaderForThread 
variable to avoid repeating to read the header which prefetched at last scan 
call. Then I test all cases again, and this time the result is ok. I make two 
patches, one for synchronized, one for AtomicReference. The result between 
synchronized version and AtomicReference version is almost the same.

 The result is:
||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
|original|1|306222|new with synchronized|1|239510|new with 
AtomicReference|1|241243|
|original|2|416390|new with synchronized|2|365367|new with 
AtomicReference|2|368952|
|original|4|555986|new with synchronized|4|540642|new with 
AtomicReference|4|545715|
|original|8|854029|new with synchronized|8|852137|new with 
AtomicReference|8|850401|

The time of new version decrease 21% for a client,  12% for two client. And the 
time is the same for 4/8 client.

I upload the new two patches. And update the descriptions with the newest 
result.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReference.patch, 
 HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925228#comment-13925228
 ] 

Ted Yu commented on HBASE-10676:


I took a look at HBASE-10676-0.98-branchV2.patch which looks good.

Nice improvement.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReference.patch, 
 HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branch.patch, HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925239#comment-13925239
 ] 

Hadoop QA commented on HBASE-10676:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633590/HBASE-10676-0.98-branchV2.patch
  against trunk revision .
  ATTACHMENT ID: 12633590

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8934//console

This message is automatically generated.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReference.patch, 
 HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branch.patch, HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 

[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-09 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925421#comment-13925421
 ] 

Andrew Purtell commented on HBASE-10676:


If the results are very similar, the AtomicReference version looks cleaner.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReference.patch, 
 HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branch.patch, HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-09 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925472#comment-13925472
 ] 

zhaojianbo commented on HBASE-10676:


Thanks. Ted Yu and Andrew Purtell
I'm a new comer. What should I do next? will the patch be merged? 

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReference.patch, 
 HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
 HBASE-10676-0.98-branch.patch, HBASE-10676-0.98-branchV2.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |
 With some modification(see the comments below), the newest result is 
 ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
 |original|1|306222|new with synchronized|1|239510|new with 
 AtomicReference|1|241243|
 |original|2|416390|new with synchronized|2|365367|new with 
 AtomicReference|2|368952|
 |original|4|555986|new with synchronized|4|540642|new with 
 AtomicReference|4|545715|
 |original|8|854029|new with synchronized|8|852137|new with 
 AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-06 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922417#comment-13922417
 ] 

zhaojianbo commented on HBASE-10676:


I have finished the AtomicReference version. 
The total results are
||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
| original | 1 | 306222|new| 1 |241313 | new with AtomicReference | 1 | 241236 |
| original | 2 | 416390|new| 2 |369064 | new with AtomicReference | 2 | 393935 |
| original | 4 | 555986|new| 4 |562152 | new with AtomicReference | 4 | 647195 |
| original | 8 | 854029|new| 8 |927244 | new with AtomicReference | 8 | 943768 |

the client time of AtomicReference version also increases for 4 and 8 client. 
It looks like the same problem.
The reason why client time increase is still analyzing.
Any idea?

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-06 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922610#comment-13922610
 ] 

zhaojianbo commented on HBASE-10676:


All tests run twice, and average to get the result.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch-AtomicReference.patch, 
 HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920853#comment-13920853
 ] 

Jean-Marc Spaggiari commented on HBASE-10676:
-

Nice, any impact on the other operations? Like get?

I can run it on PE for a day if you want.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921271#comment-13921271
 ] 

Nick Dimiduk commented on HBASE-10676:
--

Does your box have SSD or spinning disk?

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921318#comment-13921318
 ] 

stack commented on HBASE-10676:
---

Patch lgtm

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921396#comment-13921396
 ] 

Jean-Marc Spaggiari commented on HBASE-10676:
-

{quote}
Does your box have SSD or spinning disk?
{quote}
SSD on master only. RS are spinning disks.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921410#comment-13921410
 ] 

Lars Hofhansl commented on HBASE-10676:
---

We should also test the scenario when most data is filtered at the server (such 
as in Phoenix). 

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921461#comment-13921461
 ] 

Nick Dimiduk commented on HBASE-10676:
--

bq. Does your box have SSD or spinning disk?

That was a question for [~zhaojianbo] re: the perf numbers posted.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921816#comment-13921816
 ] 

Lars Hofhansl commented on HBASE-10676:
---

Do we have to make a copy when we read the previous header out, or can we an 
AtomicReference?


 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921891#comment-13921891
 ] 

zhaojianbo commented on HBASE-10676:


hi, all
I have finished the 8 client case, the performance is:
||case||client||time(ms)||
| original | 8 | 854029 |
| new | 8 | 927244 |

The result shows that client total time for 8 client case with the modification 
increases 8%. I checked the time distribution, it seemed that  the time of 
readAtOffset increased. I will find why.




 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921899#comment-13921899
 ] 

zhaojianbo commented on HBASE-10676:


{quote}
Nice, any impact on the other operations? Like get?
I can run it on PE for a day if you want.
{quote}
Thanks for your help.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921905#comment-13921905
 ] 

zhaojianbo commented on HBASE-10676:


{quote}
Does your box have SSD or spinning disk?
{quote}
I run the clusters(hdfs and hbase) on spinning disk.  

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-05 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922066#comment-13922066
 ] 

zhaojianbo commented on HBASE-10676:


{quote}
Do we have to make a copy when we read the previous header out, or can we an 
AtomicReference?
{quote}
good idea, I will try this modification.

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-04 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920445#comment-13920445
 ] 

zhaojianbo commented on HBASE-10676:


upload the patch for 0.98 branch

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-trunk.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB, 18995949 kvs. 
 Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920458#comment-13920458
 ] 

Ted Yu commented on HBASE-10676:


{code}
+  PrefetchedHeader prefetchedHeaderCopyed = new PrefetchedHeader(); 
{code}
Name the variable prefetchedHeaderCopy or prefetchedHeaderCopied.

Can you provide performance comparison for 8 client case ?

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-04 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920462#comment-13920462
 ] 

zhaojianbo commented on HBASE-10676:


ok, I will do the 8 client case

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10676) Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher perforamce of scan

2014-03-04 Thread zhaojianbo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920468#comment-13920468
 ] 

zhaojianbo commented on HBASE-10676:


patch modified. Name the variable prefetchedHeaderCopied

 Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
 perforamce of scan
 

 Key: HBASE-10676
 URL: https://issues.apache.org/jira/browse/HBASE-10676
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.0
Reporter: zhaojianbo
 Attachments: HBASE-10676-0.98-branch.patch


 PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
 backward seek operation as the comment said:
 {quote}
 we will not incur a backward seek operation if we have already read this 
 block's header as part of the previous read's look-ahead. And we also want to 
 skip reading the header again if it has already been read.
 {quote}
 But that is not the case. In the code of 0.98, prefetchedHeader is 
 threadlocal for one storefile reader, and in the RegionScanner 
 lifecycle,different rpc handlers will serve scan requests of the same 
 scanner. Even though one handler of previous scan call prefetched the next 
 block header, the other handlers of current scan call will still trigger a 
 backward seek operation. The process is like this:
 # rs handler1 serves the scan call, reads block1 and prefetches the header of 
 block2
 # rs handler2 serves the same scanner's next scan call, because rs handler2 
 doesn't know the header of block2 already prefetched by rs handler1, triggers 
 a backward seek and reads block2, and prefetches the header of block3.
 It is not the sequential read. So I think that the threadlocal is useless, 
 and should be abandoned. I did the work, and evaluated the performance of one 
 client, two client and four client scanning the same region with one 
 storefile.  The test environment is
 # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
 machine
 # A hbase cluster with a zk, a master, a regionserver in the same machine
 # clients are also in the same machine.
 So all the data is local. The storefile is about 22.7GB from our online data, 
 18995949 kvs. Caching is set 1000.
 With the improvement, the client total scan time decreases 21% for the one 
 client case, 11% for the two clients case. But the four clients case is 
 almost the same. The details tests' data is the following:
 ||case||client||time(ms)||
 | original | 1 | 306222 |
 | new | 1 | 241313 |
 | original | 2 | 416390 |
 | new | 2 | 369064 |
 | original | 4 | 555986 |
 | new | 4 | 562152 |



--
This message was sent by Atlassian JIRA
(v6.2#6252)