mdedetrich commented on PR #2409:
URL: https://github.com/apache/pekko/pull/2409#issuecomment-3599163013

   @raboof @pjfanning @pjfanning So here is a status update, I actually found a 
bug in the `ZstdDecompressor` which I am now trying to fix. The bug is quite 
simple, it happens when 
[here](https://github.com/apache/pekko/pull/2409/files#diff-36b15bf36eaf7b9bd2c64a650c0d1f602a475383340766c45c0907aedf4f9b6cR50)
 when the source buffer is larger than the bytebuffer causing a basic overflow. 
This bug didn't show in tests because individual `ByteString` elements in the 
tests happen always be smaller than the `maxBytesPerChunk`, but if you set 
`maxBytesPerChunk` to small value such as 8 then it immediately shows.
   
   To solve this, I wanted to create an ultra simple implementation which would 
essentially just be 
`ByteString.fromArrayUnsafe(Zstd.decompress(input.toArrayUnsafe))` but this is 
not possible because `Zstd.decompress` actually needs to know the original size 
of the decompressed element (and we have no context of this in a stream).
   
   Due to this I have to shuffle data from the input `ByteString` to that 
direct bytebuffer while taking into account the fact that the input 
`ByteString` can be of any size. There is a naive implementation that transfers 
one byte at a time (which is really slow), so instead the implementation copies 
as much as it can depending on the free amount of space in the `outputBuffer` 
but understandably this is a more complicated solution. Note that this solution 
is also what the zstd-jni test also 
[does](https://github.com/luben/zstd-jni/blob/9c3386d306086078155f58116a4d905e07239db4/src/test/scala/Zstd.scala#L516-L551).
 
   
   The advantage here is that this is the exact same logic that is needed to 
avoid the `ByteBuffer.allocateDirect(input.size)` in the current PR 
implementation which we need to remove anyways because of how slow 
`ByteBuffer.allocateDirect(input.size)` is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to