jelmini commented on PR #800: URL: https://github.com/apache/jackrabbit-oak/pull/800#issuecomment-1359240904
> @smiroslav I think there's a potential problem here: tar archives are append-only, so if we write the references after each flush, we will produce a lot of waste. That isn't a problem in the Azure implementation where tar archives are actually just files and directories, so we might want to have different TarWriter implementations for local file system and Azure. @ahanikel If I understand this code well, only the `AbstractRemoteSegmentArchiveWriter` class calls the lambda, while the `SegmentTarWriter` is ignoring it (as per the default implementation in `SegmentArchiveWriter`), preventing the waste you mention for local files. Although I think this implementation works, it's true that the design is somewhat fragile: someone could unknowingly decide in the future to override `flush(Runnable onSuccess)` in `SegmentTarWriter` and call the lambda. One option to prevent this, could be to implement it now explicitly in `SegmentTarWriter` by ignoring the lambda and adding a comment explaining why. Not pretty, but at least it should avoid future mistakes. The other option would be, as you suggest, to create 2 `TarWriter` implementations. However, currently `TarWriter` is not a public class and seems to be considered an implementation detail. We would need to create and expose a new public interface and change users of the current class to give the ability to accept different implementations of the interface. The design would probably be more robust and explicit, but I don't know how much effort it would need. It would probably make a lot of sense to work on this option if we anticipate other benefits of having specialised implementations of `TarWriter` for Azure or AWS, for exemple better optimisations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org