[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-11-15 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-554300792
 
 
   Closing PR since it's already merged through commit 525f3c9c45.
   
   Thanks for reviewing and merging the PR @StephanEwen , and thanks 
@tillrohrmann for the review!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-11-06 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-550312495
 
 
   > I'm wondering whether it would be easier to do it now as we are opening 
the PRs and can update them bit by bit or after everything has been merged. I 
fear that the inertia might be a bit higher if everything is in place and a 
much bigger pile of code would needed to be touched. But I'm also ok to do it 
later if you say that this is not a problem.
   
   Thanks for sharing the thoughts @tillrohrmann , I think the concern makes 
sense.
   
   I have been preparing the commit of replacing `ByteBuffer` with 
`MemorySegment` for some time and have just completed the work. It turns out 
only one new method required for `MemorySegment` and no modifications for the 
existing one (we could open another PR if would like to add the hbase-like 
padding method for operating systems w/o unaligned memory access support, which 
is not that urgent), so the impact on old components is within control. Please 
check the new commit for more details.
   
   I also checked the integration with the not-committed-yet `SpaceAllocator` 
using `MemorySegment` and confirmed everything works fine (please check the 
[PoC branch](https://github.com/carp84/flink/tree/spillable-test) for details), 
will update the [SpaceAllocator PR](https://github.com/apache/flink/pull/9516) 
soon if the changes here look good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-11-04 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-549420908
 
 
   Sorry for the late response @tillrohrmann .
   
   By comparing the current implementation of `ByteBufferUtils` and 
`MemorySegment`, I think most fundamental methods (`putPrimitive`, 
`getPrimitive`, `compareTo`, `copyTo`, etc.) are already included in 
`MemorySegment`, and I agree it's no big deal not to support operating systems 
that don't support unaligned memory access (thus we don't need those branches 
for `Unaligned` case in `ByteBufferUtils`), so overall it's definitely workable 
to replace `ByteBuffer` usage with `MemorySegment`.
   
   Regarding the efforts of the merge, there're mainly two aspects from my 
point of view:
   
   1. We may still need some modifications on `MemorySegment`. This is to some 
extent conflict with our original intention of making spill-able backend a 
standalone module to prevent impact on existing components, so I'm a little bit 
concerned. 
   
   2. In the current design we map the file segment into memory (in format of 
`MappedByteBuffer`) and use it as the base of a "Chunk", furthermore splitting 
it into "Buckets" for finer-grained usage, so everything in-flight is 
`ByteBuffer`. Replacing `ByteBuffer` with `MemorySegment` requires many 
interface changes in `CopyOnWriteSkipListStateMap` and `SpaceAllocator`.
   
   I have made a new commit to address @StephanEwen 's comments on value 
serializer. I also tried to modify key serializer with `DataOutputSerializer` 
and `DataInputDeserializer` but found it may not be as efficient due to two 
additional `System.arrayCopy` w/o changing the existing design of position 
manipulation in `DataOutputSerializer` (currently we can only set the position 
after data ingestion and don't allow "holes"). Attached is the patch I made for 
PoC, please check it and let me know your thoughts. Thanks.
   
   
[use_data_input_output_serializer_in_skiplist_key_serializer.patch.txt](https://github.com/apache/flink/files/3804829/use_data_input_output_serializer_in_skiplist_key_serializer.patch.txt)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-10-20 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-544239753
 
 
   @StephanEwen here comes some more detailed response for the review questions:
   
   About the 2nd question about `unaligned` and its testing, from the [jdk 
source 
code](https://github.com/unofficial-openjdk/openjdk/blob/e87709def542f064a7ab9fa75542230e40876310/jdk/src/share/classes/java/nio/Bits.java#L611)
 and [os.name/os.arch mapping](http://lopica.sourceforge.net/os.html) we could 
know most of our tests actually fall into the `if (UNSAFE_UNALIGNED)` clause 
(also confirmed through test coverage tools), and the `else` clause calls 
`ByteBuffer.getXXX` method which JVM itself will assure the correctness, so I 
think we don't need to worry about the "dead path"
   
   About the 4th question, checking the [source code of 
DirectByteBuffer.getLong](https://github.com/Himansu-Nayak/java7-sourcecode/blob/329bbb33cbe8620aee3cee533eec346b4b56facd/java/nio/DirectByteBuffer.java#L753)
 we could know that it checks the `unaligned` and big/little endian again, so 
yes in our implementation way calling 
`UNSAFE.getLong(ByteBufferUtils.getBufferAddress(buf) + offset);` is really 
faster than `DirectByteBuffer.getLong(offset);`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-10-16 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-542879295
 
 
   Thanks for the review @StephanEwen !
   
   Checking the code I agree that there're similar goals between 
`ByteBufferUtils` and `MemorySegment`, will give it a more careful check about 
whether we could merge them.
   
   I admit it's my bad to directly borrow the classes 
([UnsafeAccess](https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/UnsafeAccess.java)
 and 
[ByteBufferUtils](https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java))
 from hbase and lack of knowledge about whether there's already similar 
functionalities in Flink. I checked `MemoryUtils` and didn't find enough util 
methods so goes the "easy" way. And the test was also following the hbase way 
([TestByteBufferUtils](https://github.com/apache/hbase/blob/master/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestByteBufferUtils.java)),
 that covering all logic through `ByteBufferUtilsTest` instead of introducing 
an `UnsafeHelpTest`.
   
   Since the unsafe tool and unaligned logic in hbase has been used for a long 
time in different environments, I believe it should work fine, while I agree 
that it's better to have a unit test to cover the correctness checking.
   
   Will get back soon with updates after more thinking.
   
   PS. I'm still learning to switch between HBase and Flink code convention and 
hopefully this is not causing big problem, and my apology if already any (pray).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-10-11 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-541245502
 
 
   bq. Some of the tests seem to be a bit convoluted with a lot of indirections 
which are not easy to understand.
   Could you give me some example about such "convoluted with indirections"? Is 
it possible that the original `CopyOnWriteSkipListStateMapTest` is better 
regardless of the test case split up part?
   
   I've addressed all comments on `CopyOnWriteSkipListStateMap` and will get 
back on `CopyOnWriteSkipListStateMapTest` after getting some response. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-10-05 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-538676755
 
 
   > Why would this be the case? The only accessing threads should be the 
Task's main thread and the asynchronous checkpointing, right? Couldn't we say 
that the asynchronous checkpointing creates one single instance and reuses this 
instance for the whole checkpointing procedure? One could make it even a thread 
local variable if one wants to have an easy solution. So I'm not sure where the 
argument comes from that using a thin wrapping object around a pointer will 
necessarily decrease performance.
   
   When talking about wrapping a `Node` object and reuse it in every node 
manipulation, it's no more single thread accessed, because there might be 
multiple checkpoints ongoing in parallel. Considering we're using the 
copy-on-write mechanism, there might be multiple versions (or say a value 
chain) for a single key, which will make reusing a wrapper really tricky. I'm 
afraid it will not only increase the complexity of implementation but also even 
worsen readability, on the contrary to the intuition. As we all know, the 
killer is not the 99% cases, but the 1% corner case (smile).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-10-05 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-538673672
 
 
   > Where exactly do I see the performance comparison?
   Allow me to quote the conclusion: "we need 62.5% extra objects and 100% 
extra memory when using CSLM", and more detailed analysis please refer to the 
[google 
doc](https://docs.google.com/document/d/16VIY7o-18sM-pIlIYkbTuhKPmwfnqabCt_nlOARAzdg/edit#heading=h.iwes8krf3opd).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-09-25 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-534964705
 
 
   > 2. That's what I was suspecting. I don't wanna necessarily change it but 
do we know how much performance we gain by this? I think there would have been 
also other solutions to the problem. For example, one could have a reusable 
object which has a mutable `nodeAddress` field. Then one would not have to 
create a new object every time. Ideally one would hide this behind a factory 
method which can decide how to handle it. Also the access to the field could 
have been encapsulated by the object and if needed could have happened lazily. 
The benefit of this approach would be that it is easier to maintain and test 
because the API would tell you what such an object can do and we would have a 
dedicated type instead of longs you have to pass around.
   
   About the performance impact, lease refer to [the analysis of JDK CSLM 
implementation](https://docs.google.com/document/d/16VIY7o-18sM-pIlIYkbTuhKPmwfnqabCt_nlOARAzdg/edit#)
 and a compacted data structure we introduced for HBase to reduce GC pressure. 
Search for `Key space schema` and `Value space schema` in `SkipListUtils` and 
we could find a similar design here.
   
   About reusable object, it will add a lot of efforts/complexity making sure 
to prevent concurrent manipulation on it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-09-04 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-527878386
 
 
   Thanks for the continuous efforts on review @tillrohrmann ! Before 
addressing detailed comments, let me answer the general ones first:
   1. About thread safety of the implementation: the `testConcurrentSnapshots`, 
`testCopyOnWriteContracts` and `testSnapshotPruneValues` cases in 
`CopyOnWriteSkipListStateMapTest` are designed to cover the possible 
concurrency with asynchronous checkpoint. There might be one or two fields 
could be `volatile` but not, but checking the logic those are all nice to 
improve but won't affect the correctness of logic even if keep them as is. Will 
talk more when addressing the specific comments.
   2. About the design on translation from node pointer to `ByteBuffer` and 
offset, yes it's on purpose for performance consideration. If we construct a 
"Node" object (for example) every time, the cost of constructing object is 
relatively big comparing to "address"-only operations (and note that those 
address operations cannot be saved for off-heap data structure even if we have 
an abstraction of Node object). What's more, we don't need all fields of a Node 
for each operation, such as locating which Node a given key belongs to, we only 
need to compare keys until finding the target, all other "fields" of a Node is 
not needed during the process.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk state storage for spill-able heap backend

2019-08-28 Thread GitBox
carp84 commented on issue #9501: [FLINK-12697] [State Backends] Support on-disk 
state storage for spill-able heap backend
URL: https://github.com/apache/flink/pull/9501#issuecomment-525626160
 
 
   Thanks for the review @StephanEwen @tillrohrmann ! I've addressed most of 
the comments and left some questions open.
   
   For the Unsafe helper, since merging it into `MemoryUtils` now will cause 
quite some code refactor, how about we move the `UnsafeHelp` and 
`ByteBufferUtils` into `flink-core` module under `org.apache.flink.core.memory` 
package first, and then align all Unsafe/VarHandle in another PR? Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services