Serializing Large Collections: SerializePartial ParsePartial
Is there a general strategy/methodology for dealing with very large collections so that they do not need to be completely held in memory before serializing and de-serializing? I see, for example, SerializePartialToOstream() and ParsePartialFromIstream() but no documentation of how to use it. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Data structures using protocol buffers
Thanks Jeremy. That worked! But we now have information about the same node being replicated. For instance, let's say we have a field 'weight' attached to each node as shown below. This setup will replicate the weight information of a node as many times as its degree. If the weight of a node changes, I will have update all it's occurrences in the PB. Any way I can avoid it? package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 21, 6:37 pm, Jeremy Leader [EMAIL PROTECTED] wrote: Keep in mind that protobufs describe serialized data, and there's no concept of an object reference like Java uses. In your example, if A and B are neighbors, then in your proto, the data representing A contains the data representing B, and the data representing B contains the data representing A! One way around this is to implement your own form of references, perhaps using the node ids like this: package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 0; } message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNodeReference neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Hi, I'm wondering how would one go about implementing self-referential data structures? As an exercise, I tried to implement a PB version of the adjacency list representation of a graph. I'm having a hard time getting it work. Any suggestions? Thanks! --- graph.proto --- package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNode neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Data structures using protocol buffers
I was assuming all the properties of a node (weight, label, color, whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference would only have the id and nothing else. -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Thanks Jeremy. That worked! But we now have information about the same node being replicated. For instance, let's say we have a field 'weight' attached to each node as shown below. This setup will replicate the weight information of a node as many times as its degree. If the weight of a node changes, I will have update all it's occurrences in the PB. Any way I can avoid it? package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 21, 6:37 pm, Jeremy Leader [EMAIL PROTECTED] wrote: Keep in mind that protobufs describe serialized data, and there's no concept of an object reference like Java uses. In your example, if A and B are neighbors, then in your proto, the data representing A contains the data representing B, and the data representing B contains the data representing A! One way around this is to implement your own form of references, perhaps using the node ids like this: package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 0; } message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNodeReference neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Hi, I'm wondering how would one go about implementing self-referential data structures? As an exercise, I tried to implement a PB version of the adjacency list representation of a graph. I'm having a hard time getting it work. Any suggestions? Thanks! --- graph.proto --- package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNode neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Data structures using protocol buffers
That does solve the duplicate information problem but it makes updates to node attributes (like weight) difficult. Let's say, I want to assign the weight of each node to the average of its neighbors. for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) { double sum = 0; int count = 0; for(UndirectedGraphNodeReference neighbor : node.getNeighborsList() ) { sum += count++; } node.setWeight(sum/count); } - graph.proto - package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 22, 2:14 pm, Jeremy Leader [EMAIL PROTECTED] wrote: I was assuming all the properties of a node (weight, label, color, whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference would only have the id and nothing else. -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Thanks Jeremy. That worked! But we now have information about the same node being replicated. For instance, let's say we have a field 'weight' attached to each node as shown below. This setup will replicate the weight information of a node as many times as its degree. If the weight of a node changes, I will have update all it's occurrences in the PB. Any way I can avoid it? package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 21, 6:37 pm, Jeremy Leader [EMAIL PROTECTED] wrote: Keep in mind that protobufs describe serialized data, and there's no concept of an object reference like Java uses. In your example, if A and B are neighbors, then in your proto, the data representing A contains the data representing B, and the data representing B contains the data representing A! One way around this is to implement your own form of references, perhaps using the node ids like this: package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 0; } message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNodeReference neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Hi, I'm wondering how would one go about implementing self-referential data structures? As an exercise, I tried to implement a PB version of the adjacency list representation of a graph. I'm having a hard time getting it work. Any suggestions? Thanks! --- graph.proto --- package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNode neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Data structures using protocol buffers
my bad. The code snippet should be as follows: for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) { double sum = 0; int count = 0; for(UndirectedGraphNodeReference neighbor : node.getNeighborsList() ) { sum += count++; } node.setWeight(sum/count); } - graph.proto - package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; } message UndirectedGraphNode { required string id = 1; required double weight = 2; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 22, 2:36 pm, GDR [EMAIL PROTECTED] wrote: That does solve the duplicate information problem but it makes updates to node attributes (like weight) difficult. Let's say, I want to assign the weight of each node to the average of its neighbors. for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) { double sum = 0; int count = 0; for(UndirectedGraphNodeReference neighbor : node.getNeighborsList() ) { sum += count++; } node.setWeight(sum/count); } - graph.proto - package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 22, 2:14 pm, Jeremy Leader [EMAIL PROTECTED] wrote: I was assuming all the properties of a node (weight, label, color, whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference would only have the id and nothing else. -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Thanks Jeremy. That worked! But we now have information about the same node being replicated. For instance, let's say we have a field 'weight' attached to each node as shown below. This setup will replicate the weight information of a node as many times as its degree. If the weight of a node changes, I will have update all it's occurrences in the PB. Any way I can avoid it? package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 21, 6:37 pm, Jeremy Leader [EMAIL PROTECTED] wrote: Keep in mind that protobufs describe serialized data, and there's no concept of an object reference like Java uses. In your example, if A and B are neighbors, then in your proto, the data representing A contains the data representing B, and the data representing B contains the data representing A! One way around this is to implement your own form of references, perhaps using the node ids like this: package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 0; } message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNodeReference neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Hi, I'm wondering how would one go about implementing self-referential data structures? As an exercise, I tried to implement a PB version of the adjacency list representation of a graph. I'm having a hard time getting it work. Any suggestions? Thanks! --- graph.proto --- package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNode neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Data structures using protocol buffers
Protocol Buffers are a serialization format, rather than general-purpose data structures. To do computations, you'd probably want to build some auxiliary data structures, which you populate when you deserialize the protobuf data. You could have node objects that resemble your original .proto file, where nodes have references to their neighbors, and you'd probably need a map from node id to node object reference, which you'd use during deserialization. -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: my bad. The code snippet should be as follows: for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) { double sum = 0; int count = 0; for(UndirectedGraphNodeReference neighbor : node.getNeighborsList() ) { sum += count++; } node.setWeight(sum/count); } - graph.proto - package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; } message UndirectedGraphNode { required string id = 1; required double weight = 2; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 22, 2:36 pm, GDR [EMAIL PROTECTED] wrote: That does solve the duplicate information problem but it makes updates to node attributes (like weight) difficult. Let's say, I want to assign the weight of each node to the average of its neighbors. for(UndirectedGraphNode node : UndirectedGraph.getNodesList() ) { double sum = 0; int count = 0; for(UndirectedGraphNodeReference neighbor : node.getNeighborsList() ) { sum += count++; } node.setWeight(sum/count); } - graph.proto - package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 22, 2:14 pm, Jeremy Leader [EMAIL PROTECTED] wrote: I was assuming all the properties of a node (weight, label, color, whatever) would be in UndirectedGraphNode; UndirectedGraphNodeReference would only have the id and nothing else. -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Thanks Jeremy. That worked! But we now have information about the same node being replicated. For instance, let's say we have a field 'weight' attached to each node as shown below. This setup will replicate the weight information of a node as many times as its degree. If the weight of a node changes, I will have update all it's occurrences in the PB. Any way I can avoid it? package graph; option java_package = graph; option java_outer_classname = UndirectedGraphType; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 1; required double weight = 2; } message UndirectedGraphNode { required string id = 1; repeated UndirectedGraphNodeReference neighbors = 2; } message UndirectedGraph { repeated UndirectedGraphNode nodes = 1; } On Oct 21, 6:37 pm, Jeremy Leader [EMAIL PROTECTED] wrote: Keep in mind that protobufs describe serialized data, and there's no concept of an object reference like Java uses. In your example, if A and B are neighbors, then in your proto, the data representing A contains the data representing B, and the data representing B contains the data representing A! One way around this is to implement your own form of references, perhaps using the node ids like this: package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNodeReference { required string id = 0; } message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNodeReference neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } -- Jeremy Leader [EMAIL PROTECTED] GDR wrote: Hi, I'm wondering how would one go about implementing self-referential data structures? As an exercise, I tried to implement a PB version of the adjacency list representation of a graph. I'm having a hard time getting it work. Any suggestions? Thanks! --- graph.proto --- package graph; option java_package = graph; option java_outer_classname = UndirectedGraph; option optimize_for = CODE_SIZE; message UndirectedGraphNode { required string id = 0; repeated UndirectedGraphNode neighbors; } message UndirectedGraph { repeated UndirectedGraphNode nodes; } --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol
Re: Make check reports error
On Wed, Oct 22, 2008 at 7:23 AM, Niclas Blomgren [EMAIL PROTECTED] wrote: Solaris: I'm guessing this was a sparc system, not x86? Was it 32-bit or 64-bit? I think someone else reported the same problem but we were not able to track it down. Cygwin: Can you include the text of the actual failures in your log? (You only included the summary.) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Serializing Large Collections: SerializePartial ParsePartial
The Partial serialize and parse routines actually do something completely unrelated: they allow the message to be missing required fields. So, that doesn't help you. I'm afraid protocol buffers are not designed for storing very large collections in a single message. Instead, you should be thinking about representing each item in a collection using a protocol message, but then using some other container format. Using protocol buffers here makes the container format simpler because it only needs to deal with raw strings rather than having to worry about structured data. On Wed, Oct 22, 2008 at 10:19 AM, [EMAIL PROTECTED] wrote: Is there a general strategy/methodology for dealing with very large collections so that they do not need to be completely held in memory before serializing and de-serializing? I see, for example, SerializePartialToOstream() and ParsePartialFromIstream() but no documentation of how to use it. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: python setup.py test failing
I'm also seeing this on an x86_64 machine. It seems like a problem with the test as stated above. Has it been confirmed? thanks. ~jason On Oct 6, 7:34 pm, Kenton Varda [EMAIL PROTECTED] wrote: Petar, can you look into this? On Sun, Oct 5, 2008 at 2:47 AM, [EMAIL PROTECTED] wrote: Hi, I'm trying to install and test out protobuf-2.0.2. I read the instructions in python/README.txt and ran $ python setup.py test And I get == FAIL: testReadScalars (google.protobuf.internal.decoder_test.DecoderTest) -- Traceback (most recent call last): File /home/frank/projects/protobuf-2.0.2/python/google/protobuf/ internal/decoder_test.py, line 156, in testReadScalars self.ReadScalarTestHelper(*args) File /home/frank/projects/protobuf-2.0.2/python/google/protobuf/ internal/decoder_test.py, line 113, in ReadScalarTestHelper 'Type of reslt %s not the expected one %s' % (type(result), type(expected_result AssertionError After some digging into the code, I find that it's failing in decoder_test.py when testing the decoder.Decoder.ReadSFixed32() function. The expected value is long(-1), but the actual returned value was int(-1). The test passes if I change line 128 in /google/ protobuf/internal/decoder_test.py from ['sfixed32', decoder.Decoder.ReadSFixed32, long(-1), to ['sfixed32', decoder.Decoder.ReadSFixed32, -1, Logically this changes seems reasonable, but I can't tell for sure. Is this a bug in the download, or is this a platform dependent problem? I'm running Python2.5 on a x86_64 machine. This also fails with python2.4. Thanks, Frank --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Serializing Large Collections: SerializePartial ParsePartial
OK, that makes sense. Thanks for the quick reply. I work at a seismic earthquake data center. We're looking at using protocol buffers as a means of internally moving around processed chunks of data. Seems to work pretty well, as long as the chunks aren't too large (which is a problem one way or another). But working with ~5 million data points doesn't seem to be any problem. some other container format. Not exactly sure what that would look like. Thanks again. -B On Oct 22, 3:26 pm, Kenton Varda [EMAIL PROTECTED] wrote: The Partial serialize and parse routines actually do something completely unrelated: they allow the message to be missing required fields. So, that doesn't help you. I'm afraid protocol buffers are not designed for storing very large collections in a single message. Instead, you should be thinking about representing each item in a collection using a protocol message, but then using some other container format. Using protocol buffers here makes the container format simpler because it only needs to deal with raw strings rather than having to worry about structured data. On Wed, Oct 22, 2008 at 10:19 AM, [EMAIL PROTECTED] wrote: Is there a general strategy/methodology for dealing with very large collections so that they do not need to be completely held in memory before serializing and de-serializing? I see, for example, SerializePartialToOstream() and ParsePartialFromIstream() but no documentation of how to use it. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Any way to dissect ProtBuf serialized data without knowing the structure in advance?
I'm trying to consume data from an app that generates output serialized via Protocol Buffers but do not have the original spec for the specific structures that have been encoded. Is there a relatively straight-forward path to deserializing, or even just decoding, the serialized data stream without knowing its structure in advance? Hints/pointers of any variety would be welcomed. Thanks. APB --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---