[ 
https://issues.apache.org/jira/browse/OAK-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749811#comment-16749811
 ] 

Francesco Mari commented on OAK-6749:
-------------------------------------

The version of the Cold Standby at 1.6 is way too primitive compared to 1.8. 
Too many previous changes need to be backported in order to make this backport 
feasible, and that's just too risky - especially since [~Csaba Varga] managed 
to find a workaround for the issue. I'm going to resolve this issue.

> Segment-Tar standby sync fails with "in-memory" blobs present in the source 
> repo
> --------------------------------------------------------------------------------
>
>                 Key: OAK-6749
>                 URL: https://issues.apache.org/jira/browse/OAK-6749
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob, tarmk-standby
>    Affects Versions: 1.6.2
>            Reporter: Csaba Varga
>            Assignee: Francesco Mari
>            Priority: Major
>             Fix For: 1.10.1, 1.8.12, 1.10
>
>         Attachments: OAK-6749-01.patch, OAK-6749-02.patch
>
>
> We have run into some issue when trying to transition from an active/active 
> Mongo NodeStore cluster to a single Segment-Tar server with cold standby. The 
> issue itself manifests when the standby server tries to pull changes from the 
> primary after the first round of online revision GC.
> Let me summarize the way we ended up with the current state, and my 
> hypothesis about what happened, based on my debugging so far:
> # We started with a Mongo NodeStore and an external FileDataStore as the blob 
> store. The FileDataStore was set up with minRecordLength=4096. The Mongo 
> store stores blobs below minRecordLength as special "in-memory" blobIDs where 
> the data itself is baked into the ID string in hex.
> # We have executed a sidegrade of the Mongo store into a Segment-Tar store. 
> Our datastore is over 1TB in size, so copying the binaries wasn't an option. 
> The new repository is simply reusing the existing datastore. The "in-memory" 
> blobIDs still look like external blobIDs to the sidegrade process, so they 
> were copied into the Segment-Tar repository as-is, instead of being converted 
> into the efficient in-line format.
> # The server started up without issues on the new Segment-Tar store. The 
> migrated "in-memory" blob IDs seem to work fine, if a bit sub-optimal.
> # At this point, we have created a cold standby instance by copying the files 
> of the stopped primary instance and making the necessary config changes on 
> both servers.
> # Everything worked fine until the primary server started its first round of 
> online revision GC. After that process completed, the standby node started 
> throwing exceptions about missing segments, and eventually stopped 
> altogether. In the meantime, the following warning showed up in the primary 
> log:
> {code:java}
> 29.09.2017 06:12:08.088 *WARN* [nioEventLoopGroup-3-10] 
> org.apache.jackrabbit.oak.segment.standby.server.ExceptionHandler Exception 
> caught on the server
> io.netty.handler.codec.TooLongFrameException: frame length (8208) exceeds the 
> allowed maximum (8192)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:146)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:142)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:99)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:75)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
>         at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:611)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:552)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:466)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:438)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
>         at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> This is what seems to be happening:
> # The revision GC creates brand new segments, and the standby instance starts 
> pulling them into its own store.
> # When the standby sees an "in-memory" blobID, it decides that it doesn't 
> have this blob in its own blobstore, so it proceeds to ask for the bytes of 
> the blob from the primary, even though they are encoded in the ID itself.
> # The longest blobID can be more than 8K in size (the 4K blob gets doubled by 
> hex encoding). When such a long blobID is submitted to the primary, the 
> request gets rejected because of excessive length. The secondary keeps 
> waiting until the request times out, and no progress is made in syncing.
> The issue doesn't pop up with repositories that started as Segment-Tar since 
> Segment-Tar always inlines blobs below some hardcoded threshold (16K if I 
> remember correctly).
> I think there could be multiple ways to approach this, not mutually exclusive:
> * Special-case the "in-memory" BlobIDs during sidegrade and replace them with 
> the "native" segment values. If hardcoding knowledge about this 
> implementation detail isn't desired, there could be a new option for the 
> sidegrade process, to force "inlining" of blobs below a certain threshold, 
> even if they aren't in-line in the source repo.
> * Special-case the "in-memory" BlobIDs in StandbyDiff so they aren't 
> requested from the primary, but are either kept as-is or get converted to the 
> "native" format.
> * Increase the network package size limit in the sync protocol, or allow it 
> to be configured. This is the least efficient option, but with the least 
> impact on the code.
> I can work on detailed reproduction steps if needed, but I'd rather not do it 
> beforehand because this is rather cumbersome to reproduce



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to