[protobuf] Byte Offsets into File with Multiple Messages

jonathan . wolk Tue, 25 Jun 2013 14:51:52 -0700

Hi All,

I know the topic of "multiple [protobuff] messages in a single file" has 
been covered a bunch, but I have a slightly different question.


Most of the answers to "multiple messages in a single file" has been, use a 
CodedInputStream to do the reading and writing and to write the size of the 
message before the message in the file. While this works fine, my use case 
is different as I don't want to always read every single message in a file 
upon file read. So, to set up this use case, instead of writing the sizes 
of the messages before each message in a file, I write a "header" message 
at the top of the file and then individual messages after that. My header 
is as follows

<code>

option optimize_for=LITE_RUNTIME;

package MyPackage;

message ListHeader {

  message Entry

  {

    optional string name = 1;

    optional uint32 byte_offset_from_header = 2;

    optional uint32 size_bytes = 3;

  }

  repeated Entry entry = 1;

}
</code>

Conceptually, at application start, I read this header for a given file and 
store it somewhere. After that, when a particular entry is needed from a 
file (referenced by name), I want to be able to open the file, jump to a 
given entry (via the entry byte offset into the file), read the message out 
and continue on my merry way. The problem is, none of the messages I read 
are "valid" (contain correct data). They parse ok, but are corrupt. The 
header message parses fine and contains proper data.

I'm using the LITE_RUNTIME optimization, so I made subclasses of 
ZeroCopyInputStream which take in an std::ifstream. When I want to read one 
of the "entry" messages, I use seekg on the ifstream created from the file, 
then I created a ZeroCopyInputStream from that stream (via my own class), 
and then I created a CodedInputStream. I set a limit on the 
CodedInputStream to be the size of the entry from the header and then parse 
via parseFromCodedInputStream. Is this a valid workflow (using seekg on the 
stream which I then made a ZeroCopyInputStream and CodedInputStream from)? 
If not, how can I get the functionality I want?

I do calculate the byte offsets that I seekg to to be the entry's 
"byte_offset_from_header" + the coded input stream used to parse the 
header's CurrentPosition which I believe should yield the total byte offset 
from beginning of the file.

-Jonathan

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/groups/opt_out.

[protobuf] Byte Offsets into File with Multiple Messages

Reply via email to