jelmini commented on PR #800:
URL: https://github.com/apache/jackrabbit-oak/pull/800#issuecomment-1359240904

   > @smiroslav I think there's a potential problem here: tar archives are 
append-only, so if we write the references after each flush, we will produce a 
lot of waste. That isn't a problem in the Azure implementation where tar 
archives are actually just files and directories, so we might want to have 
different TarWriter implementations for local file system and Azure.
   
   @ahanikel If I understand this code well, only the 
`AbstractRemoteSegmentArchiveWriter` class calls the lambda, while the 
`SegmentTarWriter` is ignoring it (as per the default implementation in 
`SegmentArchiveWriter`), preventing the waste you mention for local files. 
   Although I think this implementation works, it's true that the design is 
somewhat fragile: someone could unknowingly decide in the future to override 
`flush(Runnable onSuccess)` in `SegmentTarWriter` and call the lambda. One 
option to prevent this, could be to implement it now explicitly in 
`SegmentTarWriter` by ignoring the lambda and adding a comment explaining why. 
Not pretty, but at least it should avoid future mistakes.
   
   The other option would be, as you suggest, to create 2 `TarWriter` 
implementations. However, currently `TarWriter` is not a public class and seems 
to be considered an implementation detail. We would need to create and expose a 
new public interface and change users of the current class to give the ability 
to accept different implementations of the interface. The design would probably 
be more robust and explicit, but I don't know how much effort it would need. It 
would probably make a lot of sense to work on this option if we anticipate 
other benefits of having specialised implementations of `TarWriter` for Azure 
or AWS, for exemple better optimisations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to