Re: [protobuf] Re: Protocol buffers and large data sets
Thanks, that worked! Terri On Mon, May 24, 2010 at 4:46 PM, Kenton Varda wrote: > My guess is that you're using a single CodedInputStream to read all your > input, repeatedly calling message.ParseFromCodedStream(). Instead, create a > new CodedInputStream for each message. If you construct it on the stack, > there is no significant overhead to doing this: > while (true) { > CodedInputStream stream(&input); > // read one message, or break if at EOF > } > > On Mon, May 24, 2010 at 12:21 PM, Terri wrote: >> >> Hi, >> >> I've been struggling to figure out just exactly how to do the many >> smaller messages approach. I've implemented this strategy, which is >> working except for a byte limit problem: >> >> >> http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no >> >> I also raised the byte limit using SetTotalBytesLimit to maxint. >> >> I use a python program to read my data form disk and package it up >> into messages that are roughly 110 bytes each. Then I pipe it to a C++ >> program that reads messages and crunches. But, I still have a problem >> because the total number of bytes of all my smaller messages is >> greater than maxint and the C++ fails to read when it hits the limit. >> >> I like the protobuf approach to passing data, I just need to remove >> that limit. >> >> What can I do? >> >> Thanks, >> Terri >> >> On May 17, 7:00 pm, Jason Hsueh wrote: >> > There is a default byte size limit of 64MB when parsing protocol buffers >> > - >> > if a message is larger than that, it will fail to parse. This can be >> > configured if you really need to parse larger messages, but it is >> > generally >> > not recommended. Additionally, ByteSize() returns a 32-bit integer, so >> > there's an implicit limit on the size of data that can be serialized. >> > >> > You can certainly use protocol buffers in large data sets, but it's not >> > recommended to have your entire data set be represented by a single >> > message. >> > Instead, see if you can break it up into smaller messages. >> > >> > >> > >> > On Mon, May 17, 2010 at 1:05 PM, sanikumbh wrote: >> > > I wanted to get some opinion on large data sets and protocol buffers. >> > > Protocol Buffer project page by google says that for data > 1 >> > > megabytes, one should consider something different but they don’t >> > > mention what would happen if one crosses this limit. Are there any >> > > known failure modes when it comes to the large data sets? >> > > What are your observations, recommendations from your experience on >> > > this front? >> > >> > > -- >> > > You received this message because you are subscribed to the Google >> > > Groups >> > > "Protocol Buffers" group. >> > > To post to this group, send email to proto...@googlegroups.com. >> > > To unsubscribe from this group, send email to >> > > >> > > protobuf+unsubscr...@googlegroups.com >> > > . >> > > For more options, visit this group at >> > >http://groups.google.com/group/protobuf?hl=en. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "Protocol Buffers" group. >> > To post to this group, send email to proto...@googlegroups.com. >> > To unsubscribe from this group, send email to >> > protobuf+unsubscr...@googlegroups.com. >> > For more options, visit this group >> > athttp://groups.google.com/group/protobuf?hl=en. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Protocol Buffers" group. >> To post to this group, send email to proto...@googlegroups.com. >> To unsubscribe from this group, send email to >> protobuf+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/protobuf?hl=en. >> > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Protocol buffers and large data sets
My guess is that you're using a single CodedInputStream to read all your input, repeatedly calling message.ParseFromCodedStream(). Instead, create a new CodedInputStream for each message. If you construct it on the stack, there is no significant overhead to doing this: while (true) { CodedInputStream stream(&input); // read one message, or break if at EOF } On Mon, May 24, 2010 at 12:21 PM, Terri wrote: > Hi, > > I've been struggling to figure out just exactly how to do the many > smaller messages approach. I've implemented this strategy, which is > working except for a byte limit problem: > > > http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no > > I also raised the byte limit using SetTotalBytesLimit to maxint. > > I use a python program to read my data form disk and package it up > into messages that are roughly 110 bytes each. Then I pipe it to a C++ > program that reads messages and crunches. But, I still have a problem > because the total number of bytes of all my smaller messages is > greater than maxint and the C++ fails to read when it hits the limit. > > I like the protobuf approach to passing data, I just need to remove > that limit. > > What can I do? > > Thanks, > Terri > > On May 17, 7:00 pm, Jason Hsueh wrote: > > There is a default byte size limit of 64MB when parsing protocol buffers > - > > if a message is larger than that, it will fail to parse. This can be > > configured if you really need to parse larger messages, but it is > generally > > not recommended. Additionally, ByteSize() returns a 32-bit integer, so > > there's an implicit limit on the size of data that can be serialized. > > > > You can certainly use protocol buffers in large data sets, but it's not > > recommended to have your entire data set be represented by a single > message. > > Instead, see if you can break it up into smaller messages. > > > > > > > > On Mon, May 17, 2010 at 1:05 PM, sanikumbh wrote: > > > I wanted to get some opinion on large data sets and protocol buffers. > > > Protocol Buffer project page by google says that for data > 1 > > > megabytes, one should consider something different but they don’t > > > mention what would happen if one crosses this limit. Are there any > > > known failure modes when it comes to the large data sets? > > > What are your observations, recommendations from your experience on > > > this front? > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "Protocol Buffers" group. > > > To post to this group, send email to proto...@googlegroups.com. > > > To unsubscribe from this group, send email to > > > protobuf+unsubscr...@googlegroups.com > > > > > > . > > > For more options, visit this group at > > >http://groups.google.com/group/protobuf?hl=en. > > > > -- > > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > > To post to this group, send email to proto...@googlegroups.com. > > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com > . > > For more options, visit this group athttp:// > groups.google.com/group/protobuf?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to proto...@googlegroups.com. > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com > . > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Protocol buffers and large data sets
Hi, I've been struggling to figure out just exactly how to do the many smaller messages approach. I've implemented this strategy, which is working except for a byte limit problem: http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no I also raised the byte limit using SetTotalBytesLimit to maxint. I use a python program to read my data form disk and package it up into messages that are roughly 110 bytes each. Then I pipe it to a C++ program that reads messages and crunches. But, I still have a problem because the total number of bytes of all my smaller messages is greater than maxint and the C++ fails to read when it hits the limit. I like the protobuf approach to passing data, I just need to remove that limit. What can I do? Thanks, Terri On May 17, 7:00 pm, Jason Hsueh wrote: > There is a default byte size limit of 64MB when parsing protocol buffers - > if a message is larger than that, it will fail to parse. This can be > configured if you really need to parse larger messages, but it is generally > not recommended. Additionally, ByteSize() returns a 32-bit integer, so > there's an implicit limit on the size of data that can be serialized. > > You can certainly use protocol buffers in large data sets, but it's not > recommended to have your entire data set be represented by a single message. > Instead, see if you can break it up into smaller messages. > > > > On Mon, May 17, 2010 at 1:05 PM, sanikumbh wrote: > > I wanted to get some opinion on large data sets and protocol buffers. > > Protocol Buffer project page by google says that for data > 1 > > megabytes, one should consider something different but they don’t > > mention what would happen if one crosses this limit. Are there any > > known failure modes when it comes to the large data sets? > > What are your observations, recommendations from your experience on > > this front? > > > -- > > You received this message because you are subscribed to the Google Groups > > "Protocol Buffers" group. > > To post to this group, send email to proto...@googlegroups.com. > > To unsubscribe from this group, send email to > > protobuf+unsubscr...@googlegroups.com > > . > > For more options, visit this group at > >http://groups.google.com/group/protobuf?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to proto...@googlegroups.com. > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com. > For more options, visit this group > athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Protocol buffers and large data sets
There is a default byte size limit of 64MB when parsing protocol buffers - if a message is larger than that, it will fail to parse. This can be configured if you really need to parse larger messages, but it is generally not recommended. Additionally, ByteSize() returns a 32-bit integer, so there's an implicit limit on the size of data that can be serialized. You can certainly use protocol buffers in large data sets, but it's not recommended to have your entire data set be represented by a single message. Instead, see if you can break it up into smaller messages. On Mon, May 17, 2010 at 1:05 PM, sanikumbh wrote: > I wanted to get some opinion on large data sets and protocol buffers. > Protocol Buffer project page by google says that for data > 1 > megabytes, one should consider something different but they don’t > mention what would happen if one crosses this limit. Are there any > known failure modes when it comes to the large data sets? > What are your observations, recommendations from your experience on > this front? > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to proto...@googlegroups.com. > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com > . > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Protocol buffers and large data sets
I wanted to get some opinion on large data sets and protocol buffers. Protocol Buffer project page by google says that for data > 1 megabytes, one should consider something different but they don’t mention what would happen if one crosses this limit. Are there any known failure modes when it comes to the large data sets? What are your observations, recommendations from your experience on this front? -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: Large data sets
On Tue, Oct 6, 2009 at 9:34 AM, Brenden Matthews wrote: > it specifies that "if you are dealing in messages larger than a > megabyte each, it may be time to consider an alternate strategy". > > My question is: does this apply to messages which are large because > they themselves contain many (i.e., thousands) of small messages? Yes. The issue here is not with protocol buffers, which can happily parse messages up to 2GB*. The issue is with your app's design. Most apps are better off splitting their data into smaller chunks which they can manipulate individually, rather than have one huge message that must be parsed and serialized all at once. 1MB is actually not that big in itself, but if 1MB is the average case for your app, then the worst case is probably much bigger. Furthermore, messages tend to grow over time, so what is 1MB now may be more in the future. Splitting messages lets your app scale smoothly. All that said, it's possible you have an unusual case where these guidelines don't apply. Use your judgment. * Note that protocol buffers by default will refuse to parse messages over 64MB as a security precaution -- we don't want to get anywhere near a situation where integer overflow might be a concern. However, you can increase that limit by creating a CodedInputStream manually and calling the appropriate method to set the message size limit. I don't recommend it, but it's there if needed. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Large data sets
Hi, In the documentation here: http://code.google.com/apis/protocolbuffers/docs/techniques.html#large-data it specifies that "if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy". My question is: does this apply to messages which are large because they themselves contain many (i.e., thousands) of small messages? Is it okay to pack the data in this fashion, or should I start packing each individual message manually using a type/length/value method? Right now I have something like: message BLM { } message TLM { repeated BLM blm = 1; } where there is only 1 TLM (top level message) and several BLMs (bottom level messages). Thanks in advance, Brenden --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---