Re: [protobuf] Problem : Serialized in protobuf-net, deserialize in C++ app

2010-03-11 Thread Michael Poole
Roey writes:

[snip]
 The problem is, I get a memory access violation when I try to
 deserialize it :(
 Unhandled exception at 0x02f166d8 in wmplayer.exe: 0xC005: Access
 violation writing location 0x.

 Are any of the things I'm doing here wrong , for what I'm trying to do
 (serialize in C# .NET and deserialize in C++?)

Most likely yes, but it's hard to suggest what that is without a brief
code example that shows the crash.  The crash is obviously trying to
write into a null pointer, but only you can find out what code is at
address 0x02f166d8.

Michael Poole

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] marking end of record a good idea?

2010-02-12 Thread Michael Poole
Yang writes:

 currently Protocol Buffer parser moves down to the stream until it reaches end
 of buffer,

 but hadoop currently has a bug (   https://issues.apache.org/jira/browse/
 MAPREDUCE-1487  ) that presents a buffer larger than the actual 
 message
 to PB parser,
 so PB parses some junk, and fails.

 right now the only hack I have against this is to add an end-of-record tag to
 PB, so that the parser also recognizes this in addition to buffer.


 I think this could be a good idea since we proactively guard against  unsafe
 situations like the above.

Why not use something like the existing Java methods writeDelimitedTo()
and parseDelimitedFrom() to add the correct length in front of the
message?  (This is also one of the most preferred ways to store multiple
messages into a stream.  These methods currently only exist in Java, but
the same approach is straightforward to implement in C++ with
CodedInputStream::PushLimit, and I would guess that Python also gives
you a way to do it.)

Michael Poole

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Arbitrary corruption of repeated fields

2010-01-27 Thread Michael Poole
Stefan writes:

 What could I do reduce the risk of losing the entire list due to
 arbitrary corruption? What if corruption only occurs at the end of the
 file, would it be simpler to recover all the elements up to the
 corruption point?

If you serialize the elements inside the Bag to the disk individually,
you could prefix them with a synchronizing marker and length.  A marker
would typically be a fixed-length pattern that is unlikely to appear in
legitimate data -- starting with a zero byte is a good way given
Protocol Buffers data, it should contain some other (ideally uncommon)
bytes for robustness.

By reading the marker, length, message, and checking the next marker,
your program can be reasonably sure that the detected message boundaries
are correct.  Recovery then becomes a matter of looking for the next
synchronizing marker, and checking it the same way.

There is obviously a tradeoff between how much data you can lose with a
corrupted message and the per-message overhead.  If you were using the
particular example in your email, you might serialize a Bag that
contains several Items rather than serializing each Item individually.

Michael Poole

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Instantiating a generic message object when you don't know what message type you should receive

2009-12-23 Thread Michael Poole
read.the news writes:

 Uncompilable code below:
 ===
   ::google::protobuf::Message *pMsgGeneric =
 MessageFactory.serializeFrom(inputRawMessageBuffer); // we just
 obtained a generic message instance from a buffer

 // later in the code when the worker thread actually processes the
 request and prepares a reply.
   if (pMsgGeneric.getType() == M1) {
 M1 *myConcreteM1Instance = (M1) pMsgGeneric;
   }
 ===

The inputRawMessageBuffer does not have any type information for the
entire message buffer to say the buffer contains a message of type M1 --
it just has a sequence of fields that belong to M1 (or M2, or some other
message type).  The reason for the M3 layer is to provide a tag
describing the enclosing message type.

Michael Poole

--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




Re: How to write a protocol buffer message which correspond to java.util.Map

2009-08-26 Thread Michael Poole

DeWitt Clinton writes:

 I've also used the pattern:

   message Map {
     message Entry {
        optional string name = 1;
        optional string value = 2;
     }
     repeated Entry entries = 1;
   }

 Alkis, do you see benefits or downsides between the two approaches?

With a message for each pair, you spend a few extra bytes to identify
each embedded message (namely, the message tag and length), but you
don't have to worry about keeping two arrays synchronized with each
other.

My personal inclination would be for the embedded message unless the
dictionary has extremely many elements -- but if it were that large, I
would probably not pass the whole thing in one chunk anyway.

Michael Poole

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: field ordering

2009-08-20 Thread Michael Poole

roger peppe writes:

 the documentation says, on writing fields in order: This allows
 parsing code to use optimizations
 that rely on field numbers being in sequence.

 what optimizations might these be?
 does the current implementation use any such optimizations?
 what penalty do i pay by *not* writing fields in order?

The C++ generator does this.  If you look at some of the generated
MergePartialFromCodedStream() methods, you can see how this is done.

The code peeks at the next tag, and if it has the expected
(in-sequence) value, needs only a single conditional branch to go to
the correct handler.  If the fields are not in order, the parser must
find the correct handler through a switch statement.  Due to the
difficulty of predicting the path that will be taken by that switch,
this can save considerable time on deeply pipelined CPUs.

Michael Poole

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Streaming different types of messages

2009-03-27 Thread Michael Poole

achin...@gmail.com writes:

 Thanks. Also how do I know the type of the message? One way would be
 to check all optional fields (each represent a different type of
 message) of the wrapper message and then pick the one which is not
 null. Is that the only way?

You can add a (required) field to indicate the intended contents of
the message, as described at
http://code.google.com/apis/protocolbuffers/docs/techniques.html#union

Note that Protocol Buffers will not enforce the business rule that
the message's declared type and actual content must match, but it is
straightforward to create a wrapper that will do that.

Michael Poole

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---