FWIW capnp messages already encode their own size at the start of the message (or, rather, they encode a segment table, which you can sum up to get the total size).
This might be useful: https://github.com/sandstorm-io/capnproto/blob/master/c++/src/capnp/serialize.h#L111 -Kenton On Fri, Apr 14, 2017 at 1:17 PM, <stepan.buj...@gmail.com> wrote: > Thanks for the reply. Option 1 seems pretty reasonable for me. I would > probably go as far as to frame the messages with magic + message size, that > way I can verify that when there's another magic (or end of file) at > current position + message size It's probably correct. > > On Friday, April 14, 2017 at 1:08:55 PM UTC-7, Kenton Varda wrote: >> >> Hi Stepan, >> >> No, there's no easy way to detect the corruption your describe. In fact, >> for most serialization formats, there's no solution to this problem. Once >> you've lost track of message boundaries, it's impossible to tell the >> difference between the start of a new message vs. data in the previous >> message, since any message can contain arbitrary byte blobs (e.g. via the >> `Data` type). >> >> If what you describe is a requirement for your use case, you could >> accomplish it with an additional framing layer. >> >> Option 1: Choose an 128-bit unguessable random number before you start >> writing. Write that number before each message. Now you can scan the bytes >> of the file looking for this 128-bit sequence and, if you see it, you can >> be fairly certain (p ~= 2^-128) that a new message starts after it. You >> have to use a new random number for every file in case you ever embed a >> whole file into another file. >> >> Option 2: Choose a magic number to write before each message, *and* scan >> the contents of each message for this number, replacing it with an "escape >> sequence" if seen. Do the opposite transformation while reading. This >> allows you to detect boundaries "perfectly" (zero probability of false >> positive) but you lose the benefits of zero-copy due to the need to process >> escape sequences. >> >> -Kenton >> >> On Fri, Apr 14, 2017 at 12:35 PM, <stepan...@gmail.com> wrote: >> >>> I have a message that serializes into 24 bytes. I write two messages to >>> a file resulting in a file thats 48 bytes long. Now I truncate the file to >>> 40 bytes and write one message, so the file now looks like this: 1 full >>> message, one broken, 1 full message. Is there any way to iterate over the >>> file and when encountering the broken message detect that it is broken and >>> skip directly to the second full message? I've been using python to read >>> such file with following code >>> >>> def main(): >>> with open('dates.txt', 'r') as fp: >>> for date in date_capnp.Date.read_multiple(fp): >>> print(date) >>> >>> But it fails with following message: >>> >>> Message contains non-struct pointer where struct pointer was expected >>> >>> Also, if it's possible to detect such message, is it possible to get >>> it's position and length? Thank you. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Cap'n Proto" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to capnproto+...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/capnproto. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to capnproto+unsubscr...@googlegroups.com. > Visit this group at https://groups.google.com/group/capnproto. > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+unsubscr...@googlegroups.com. Visit this group at https://groups.google.com/group/capnproto.