Re: [protobuf] Re: Protocol buffers and large data sets

2010-05-27 Thread Terri Kamm
Thanks, that worked!

Terri


On Mon, May 24, 2010 at 4:46 PM, Kenton Varda  wrote:
> My guess is that you're using a single CodedInputStream to read all your
> input, repeatedly calling message.ParseFromCodedStream().  Instead, create a
> new CodedInputStream for each message.  If you construct it on the stack,
> there is no significant overhead to doing this:
>   while (true) {
>     CodedInputStream stream(&input);
>     // read one message, or break if at EOF
>   }
>
> On Mon, May 24, 2010 at 12:21 PM, Terri  wrote:
>>
>> Hi,
>>
>> I've been struggling to figure out just exactly how to do the many
>> smaller messages approach. I've implemented this strategy, which is
>> working except for a byte limit problem:
>>
>>
>> http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no
>>
>> I also raised the byte limit using SetTotalBytesLimit to maxint.
>>
>> I use a python program to read my data form disk and package it up
>> into messages that are roughly 110 bytes each. Then I pipe it to a C++
>> program that reads messages and crunches. But, I still have a problem
>> because the total number of bytes of all my smaller messages is
>> greater than maxint and the C++ fails to read when it hits the limit.
>>
>> I like the protobuf approach to passing data, I just need to remove
>> that limit.
>>
>> What can I do?
>>
>> Thanks,
>> Terri
>>
>> On May 17, 7:00 pm, Jason Hsueh  wrote:
>> > There is a default byte size limit of 64MB when parsing protocol buffers
>> > -
>> > if a message is larger than that, it will fail to parse. This can be
>> > configured if you really need to parse larger messages, but it is
>> > generally
>> > not recommended. Additionally, ByteSize() returns a 32-bit integer, so
>> > there's an implicit limit on the size of data that can be serialized.
>> >
>> > You can certainly use protocol buffers in large data sets, but it's not
>> > recommended to have your entire data set be represented by a single
>> > message.
>> > Instead, see if you can break it up into smaller messages.
>> >
>> >
>> >
>> > On Mon, May 17, 2010 at 1:05 PM, sanikumbh  wrote:
>> > > I wanted to get some opinion on large data sets and protocol buffers.
>> > > Protocol Buffer project page by google says that for data > 1
>> > > megabytes, one should consider something different but they don’t
>> > > mention what would happen if one crosses this limit. Are there any
>> > > known failure modes when it comes to the large data sets?
>> > > What are your observations, recommendations from your experience on
>> > > this front?
>> >
>> > > --
>> > > You received this message because you are subscribed to the Google
>> > > Groups
>> > > "Protocol Buffers" group.
>> > > To post to this group, send email to proto...@googlegroups.com.
>> > > To unsubscribe from this group, send email to
>> > >
>> > > protobuf+unsubscr...@googlegroups.com
>> > > .
>> > > For more options, visit this group at
>> > >http://groups.google.com/group/protobuf?hl=en.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "Protocol Buffers" group.
>> > To post to this group, send email to proto...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > protobuf+unsubscr...@googlegroups.com.
>> > For more options, visit this group
>> > athttp://groups.google.com/group/protobuf?hl=en.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To post to this group, send email to proto...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> protobuf+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/protobuf?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Protocol buffers and large data sets

2010-05-24 Thread Kenton Varda
My guess is that you're using a single CodedInputStream to read all your
input, repeatedly calling message.ParseFromCodedStream().  Instead, create a
new CodedInputStream for each message.  If you construct it on the stack,
there is no significant overhead to doing this:

  while (true) {
CodedInputStream stream(&input);
// read one message, or break if at EOF
  }

On Mon, May 24, 2010 at 12:21 PM, Terri  wrote:

> Hi,
>
> I've been struggling to figure out just exactly how to do the many
> smaller messages approach. I've implemented this strategy, which is
> working except for a byte limit problem:
>
>
> http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no
>
> I also raised the byte limit using SetTotalBytesLimit to maxint.
>
> I use a python program to read my data form disk and package it up
> into messages that are roughly 110 bytes each. Then I pipe it to a C++
> program that reads messages and crunches. But, I still have a problem
> because the total number of bytes of all my smaller messages is
> greater than maxint and the C++ fails to read when it hits the limit.
>
> I like the protobuf approach to passing data, I just need to remove
> that limit.
>
> What can I do?
>
> Thanks,
> Terri
>
> On May 17, 7:00 pm, Jason Hsueh  wrote:
> > There is a default byte size limit of 64MB when parsing protocol buffers
> -
> > if a message is larger than that, it will fail to parse. This can be
> > configured if you really need to parse larger messages, but it is
> generally
> > not recommended. Additionally, ByteSize() returns a 32-bit integer, so
> > there's an implicit limit on the size of data that can be serialized.
> >
> > You can certainly use protocol buffers in large data sets, but it's not
> > recommended to have your entire data set be represented by a single
> message.
> > Instead, see if you can break it up into smaller messages.
> >
> >
> >
> > On Mon, May 17, 2010 at 1:05 PM, sanikumbh  wrote:
> > > I wanted to get some opinion on large data sets and protocol buffers.
> > > Protocol Buffer project page by google says that for data > 1
> > > megabytes, one should consider something different but they don’t
> > > mention what would happen if one crosses this limit. Are there any
> > > known failure modes when it comes to the large data sets?
> > > What are your observations, recommendations from your experience on
> > > this front?
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Protocol Buffers" group.
> > > To post to this group, send email to proto...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > protobuf+unsubscr...@googlegroups.com
> 
> >
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/protobuf?hl=en.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> > To post to this group, send email to proto...@googlegroups.com.
> > To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> > For more options, visit this group athttp://
> groups.google.com/group/protobuf?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Protocol buffers and large data sets

2010-05-24 Thread Terri
Hi,

I've been struggling to figure out just exactly how to do the many
smaller messages approach. I've implemented this strategy, which is
working except for a byte limit problem:

http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no

I also raised the byte limit using SetTotalBytesLimit to maxint.

I use a python program to read my data form disk and package it up
into messages that are roughly 110 bytes each. Then I pipe it to a C++
program that reads messages and crunches. But, I still have a problem
because the total number of bytes of all my smaller messages is
greater than maxint and the C++ fails to read when it hits the limit.

I like the protobuf approach to passing data, I just need to remove
that limit.

What can I do?

Thanks,
Terri

On May 17, 7:00 pm, Jason Hsueh  wrote:
> There is a default byte size limit of 64MB when parsing protocol buffers -
> if a message is larger than that, it will fail to parse. This can be
> configured if you really need to parse larger messages, but it is generally
> not recommended. Additionally, ByteSize() returns a 32-bit integer, so
> there's an implicit limit on the size of data that can be serialized.
>
> You can certainly use protocol buffers in large data sets, but it's not
> recommended to have your entire data set be represented by a single message.
> Instead, see if you can break it up into smaller messages.
>
>
>
> On Mon, May 17, 2010 at 1:05 PM, sanikumbh  wrote:
> > I wanted to get some opinion on large data sets and protocol buffers.
> > Protocol Buffer project page by google says that for data > 1
> > megabytes, one should consider something different but they don’t
> > mention what would happen if one crosses this limit. Are there any
> > known failure modes when it comes to the large data sets?
> > What are your observations, recommendations from your experience on
> > this front?
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Protocol Buffers" group.
> > To post to this group, send email to proto...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > protobuf+unsubscr...@googlegroups.com
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/protobuf?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to 
> protobuf+unsubscr...@googlegroups.com.
> For more options, visit this group 
> athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Protocol buffers and large data sets

2010-05-17 Thread Jason Hsueh
There is a default byte size limit of 64MB when parsing protocol buffers -
if a message is larger than that, it will fail to parse. This can be
configured if you really need to parse larger messages, but it is generally
not recommended. Additionally, ByteSize() returns a 32-bit integer, so
there's an implicit limit on the size of data that can be serialized.

You can certainly use protocol buffers in large data sets, but it's not
recommended to have your entire data set be represented by a single message.
Instead, see if you can break it up into smaller messages.

On Mon, May 17, 2010 at 1:05 PM, sanikumbh  wrote:

> I wanted to get some opinion on large data sets and protocol buffers.
> Protocol Buffer project page by google says that for data > 1
> megabytes, one should consider something different but they don’t
> mention what would happen if one crosses this limit. Are there any
> known failure modes when it comes to the large data sets?
> What are your observations, recommendations from your experience on
> this front?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Protocol buffers and large data sets

2010-05-17 Thread sanikumbh
I wanted to get some opinion on large data sets and protocol buffers.
Protocol Buffer project page by google says that for data > 1
megabytes, one should consider something different but they don’t
mention what would happen if one crosses this limit. Are there any
known failure modes when it comes to the large data sets?
What are your observations, recommendations from your experience on
this front?

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: Large data sets

2009-10-06 Thread Kenton Varda
On Tue, Oct 6, 2009 at 9:34 AM, Brenden Matthews wrote:

> it specifies that "if you are dealing in messages larger than a
> megabyte each, it may be time to consider an alternate strategy".
>
> My question is: does this apply to messages which are large because
> they themselves contain many (i.e., thousands) of small messages?


Yes.  The issue here is not with protocol buffers, which can happily parse
messages up to 2GB*.  The issue is with your app's design.  Most apps are
better off splitting their data into smaller chunks which they can
manipulate individually, rather than have one huge message that must be
parsed and serialized all at once.  1MB is actually not that big in itself,
but if 1MB is the average case for your app, then the worst case is probably
much bigger.  Furthermore, messages tend to grow over time, so what is 1MB
now may be more in the future.  Splitting messages lets your app scale
smoothly.

All that said, it's possible you have an unusual case where these guidelines
don't apply.  Use your judgment.

* Note that protocol buffers by default will refuse to parse messages over
64MB as a security precaution -- we don't want to get anywhere near a
situation where integer overflow might be a concern.  However, you can
increase that limit by creating a CodedInputStream manually and calling the
appropriate method to set the message size limit.  I don't recommend it, but
it's there if needed.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Large data sets

2009-10-06 Thread Brenden Matthews

Hi,

In the documentation here:

http://code.google.com/apis/protocolbuffers/docs/techniques.html#large-data

it specifies that "if you are dealing in messages larger than a
megabyte each, it may be time to consider an alternate strategy".

My question is: does this apply to messages which are large because
they themselves contain many (i.e., thousands) of small messages?  Is
it okay to pack the data in this fashion, or should I start packing
each individual message manually using a type/length/value method?

Right now I have something like:

message BLM {

}

message TLM {
repeated BLM blm = 1;
}

where there is only 1 TLM (top level message) and several BLMs (bottom
level messages).

Thanks in advance,

Brenden
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---