Hi Mike, Thanks for being interested in the CompactionPipeline implementation. It is pleasure to discuss it with you. Regarding that we are implementing our own copy-on-write (COW) list. May be it is close, but in classic COW, everybody is sharing the same read-only copy and when someone tries to write on this copy it gets its own/personal copy updated according to this write. This is not what happens in the pipeline. In pipeline we let everyone read the same read-only copy, because read accesses are more frequent. When rare update to the pipeline happens, it is synchronized on the pipeline itself (writable) and the the read-only copy is updated (quickly). So all this is done for a faster synchronization. Anyway I am not aware of some from-the-shelf Java list, giving me the same synchronization as I want. Please update me if I am wrong. Regarding "I am concerned about the LL copy in pushHead - even if addFirst is faster, a LL copy is fairly slow and likely loses us any gains". As you can see, recreation of the read-only-copy happens anytime the background pipeline changes (addFirst, swap, replaceAtIndex), which are rare operations happening on snapshot, compaction, flattening, respectively. The copy of the segment after all is the copy of the references without copying the entire data itself. We had previous type of synchronization before (without read-only-copy) and it was slower. So if you believe, read-only-copy creation is a key for some performance problem, please give provide any measurements. Regarding "Also, I'm a little dubious on the use of LL given that we support a replaceAtIndex which will be much faster in an array". Generally I agree that change the implementation of "readOnlyCopy" from LinkedList to ArrayList, might be beneficial here. Specially for the replaceAtIndex case. I don't see how ArrayDeque helps us. Thanks,Anastasia On Sunday, March 11, 2018, 8:06:05 AM GMT+2, 张铎(Duo Zhang) <palomino...@gmail.com> wrote: I believe the comments there are mainly about concurrency problem, not for linked list vs. array list, at least for me...
2018-03-11 4:12 GMT+08:00 Mike Drob <mad...@cloudera.com>: > Hi devs, > > I was reading through HBASE-17434 trying to understand why we have two > linked lists in compaction pipeline and I'm having trouble following the > conversation there, especially since it seems intertwined with HBASE-17379 > and jumps back and forth a few times. > > It looks like we are implementing our own copy-on-write list, and there is > a claim that addFirst is faster on a LinkedList than an array based list. I > am concerned about the LL copy in pushHead - even if addFirst is faster, a > LL copy is fairly slow and likely loses us any gains. Also, I'm a little > dubious on the use of LL given that we support a replaceAtIndex which will > be much faster in an array. > > Can we improve by using an ArrayDeque? > > Eschar, Anastasia, WDYT? > > Thanks, > Mike > > Some observations about performance - > https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/ >