[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-23 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390594#comment-15390594
 ] 

Duo Zhang commented on HBASE-16278:
---

This the test result on a machine with 2 * E5-2630 v2, 128G RAM. The machine is 
not used by me exclusively so the result may have some noise, but it does not 
change the qualitative result since CHM is much faster.

{noformat}
./bin/run.sh -f 1 -t 10 -i 10
Benchmark   Mode  Cnt ScoreError  Units
CHMTest.test   thrpt   10  1328.692 ± 26.052  ops/s
CSLMTest.test  thrpt   10   371.875 ±  3.299  ops/s

./bin/run.sh -f 1 -t 20 -i 10
Benchmark   Mode  Cnt ScoreError  Units
CHMTest.test   thrpt   10  2093.498 ± 42.794  ops/s
CSLMTest.test  thrpt   10   560.551 ± 23.960  ops/s

./bin/run.sh -f 1 -t 40 -i 10
Benchmark   Mode  Cnt ScoreError  Units
CHMTest.test   thrpt   10  2072.665 ± 52.749  ops/s
CSLMTest.test  thrpt   10   621.861 ± 14.405  ops/s

./bin/run.sh -f 1 -t 100 -i 10
Benchmark   Mode  Cnt ScoreError  Units
CHMTest.test   thrpt   10  2105.644 ± 27.742  ops/s
CSLMTest.test  thrpt   10   624.786 ± 12.830  ops/s

./bin/run.sh -f 1 -t 200 -i 10
Benchmark   Mode  Cnt ScoreError  Units
CHMTest.test   thrpt   10  2160.738 ± 53.120  ops/s
CSLMTest.test  thrpt   10   639.131 ± 10.326  ops/s
{noformat}

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390800#comment-15390800
 ] 

stack commented on HBASE-16278:
---

+1

Anything you want me to try on a cluster? Anything that would highlight benefit 
of CHM over CSLM?

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-23 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390945#comment-15390945
 ] 

Duo Zhang commented on HBASE-16278:
---

[~stack] We can find unnecessary CSLM in code and open sub tasks to change 
them. I do not know if it could make a big difference right now since HBase is 
not a micro benchmark...

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391123#comment-15391123
 ] 

stack commented on HBASE-16278:
---

Makes sense [~Apache9]

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-24 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391387#comment-15391387
 ] 

Duo Zhang commented on HBASE-16278:
---

[~ikeda] One problem is that, we may use a byte[] as key multiple times in a 
method, so declare a map with something like ByteArrayWrapper can prevent 
allocating an extra object every time.

And I think it is also a burden that we need to track the interface change 
between different java versions. For example, in java8 there is a 
computeIfAbsent method, which is very useful. And master is claimed to only 
support java 8+, so in master we should also implement this method. But for 
branch-1, we can not implement it since we should also support java 7. Of 
course, this is not a problem that can not be solved but I think a wrapper 
class is simple and enough.

Thanks.

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
> Attachments: ConcurrentHashByteArrayMap.java
>
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-25 Thread Hiroshi Ikeda (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391480#comment-15391480
 ] 

Hiroshi Ikeda commented on HBASE-16278:
---

Creating an object itself are quite light-weight and that has the almost same 
cost as synchronization without contention, according to my old Java book. If 
you haven't worry about how many times we access volatile fields nor CAS, it 
would not make sense to just think about creating small objects (and concurrent 
maps will use CAS many times in order to avoid block).

However, I agree that using raw byte arrays is a bad design because of their 
mutability, in general. But it seems too late (and we cannot help from the 
beginning since Hadoop have adopted that in its API) and there too many usages 
of raw bytes. From viewpoints of both performance and object-oriented 
programing, I think it would not pay to do something about that.

I didn't know the master just supports Java8+, but I think a developer who want 
to use the new method can fix the code. After all, maps using mutable keys 
cannot be used for general purposes.

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
> Attachments: ConcurrentHashByteArrayMap.java
>
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16278) Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible

2016-07-25 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391520#comment-15391520
 ] 

Heng Chen commented on HBASE-16278:
---

I don't  think use {{ByteArrayWrapper}} as [~Apache9] said above has essential 
difference with {{ConcurrentHashByteArrayMap}} which [~ikeda] uploaded.  
It seems {{ConcurrentHashByteArrayMap}} has wrapped the byte[] internal, but 
[~Apache9] do it explicitly outside the CHM   

> Use ConcurrentHashMap instead of ConcurrentSkipListMap if possible
> --
>
> Key: HBASE-16278
> URL: https://issues.apache.org/jira/browse/HBASE-16278
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
> Attachments: ConcurrentHashByteArrayMap.java
>
>
> SSD and 10G network make our system CPU bound again, so the speed of memory 
> operation only code becomes more and more important.
> In HBase, if want to use byte[] as a map key, then we will always use CSLM 
> even if we do not need the map to be ordered. I know that this could save one 
> object allocation since we can not use byte[] directly as CHM's key. But we 
> all know that CHM is faster than CSLM, so I wonder if it worth to use CSLM 
> instead of CHM only because one extra object allocation.
> Then I wrote a simple jmh micro benchmark to test the performance of CHM and 
> CSLM. The code could be found here
> https://github.com/Apache9/microbench
> It turns out that CHM is still much faster than CSLM with one extra object 
> allocation.
> So I think we should always use CHM if we do not need the keys to be sorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)