vamossagar12 commented on a change in pull request #10798:
URL: https://github.com/apache/kafka/pull/10798#discussion_r644616944



##########
File path: 
streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java
##########
@@ -505,6 +506,14 @@ private void closeOpenIterators() {
         }
     }
 
+    private ByteBuffer createDirectByteBufferAndPut(byte[] bytes) {
+        ByteBuffer directBuffer = ByteBuffer.allocateDirect(bytes.length);

Review comment:
       Thank you @guozhangwang , @cadonna . I think creating it every time does 
not make much sense. I should have been more careful before adding and asking 
for the internal benchmarks. In that case, would it even make sense to have it 
in an API like put and instead use it for 
putAll()/range/reverseRange/prefixSeek operations? 
   
   That's because in the case of put, it is difficult to know how many put 
operations may be requested. If users were using the rocksdb library directly, 
then they can create DirectByteBuffers once and push as many entries as they 
want.
   
   Based upon my conversations with the rocksdb one of the comments was this:
   
   `Extracting large amounts of data under high concurrency, non-direct byte 
buffer will bring serious GC problems to the upper level Java services.`
   
   i guess, we can target those APIs? WDYT?
   
   
   

##########
File path: 
streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java
##########
@@ -505,6 +506,14 @@ private void closeOpenIterators() {
         }
     }
 
+    private ByteBuffer createDirectByteBufferAndPut(byte[] bytes) {
+        ByteBuffer directBuffer = ByteBuffer.allocateDirect(bytes.length);

Review comment:
       ok. That makes sense. High concurrency is one of the cases where this 
might be useful. Having said that, on the PR, there are benchmarking numbers 
for a large number of put operations in a single threaded manner. As per the 
numbers direct byte buffer was 37% faster and with 0 GC cycles. 
   
   Here is the comment: 
https://github.com/facebook/rocksdb/pull/2283#issuecomment-561563037
   
   The users of kafka streams might call put() in this manner where in they 
loop through a bunch of records and use put() to insert. From the state store 
side, either we create 1 DirectByteBuffer object for put() and keep reusing it- 
subject to testing. But that might not always be the case. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to