Re: intermittent issue with encode (version 2.0.3)
As long as your threads do not start before main() is called, you are fine. Again, the function in question is guaranteed to be called before main() starts. On Fri, Jul 10, 2009 at 3:52 PM, Rizzuto, Raymond wrote: > Is there something I need to do in the application to cause that to > happen before I’ve spun off multiple threads? Currently I spin up threads, > which then start making calls to fill in and then encode google > protobuffers. > > > -- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 6:05 PM > > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > As the comment says, the first call will always occur at startup time when > there is only one thread anyway, so it's perfectly safe. The parenthetical > about GCC4 is just an aside. > > On Thu, Jul 9, 2009 at 2:47 PM, Rizzuto, Raymond > wrote: > > I am a bit nervous about the GCC4 comment in > GeneratedMessageFactory::singleton (message.cc): > > > > // No need for thread-safety here because this will be called at static > > // initialization time. (And GCC4 makes this thread-safe anyway.) > > > > I’m using gcc 3.3.3. > > > > The singleton object in GeneratedMessageFactory::singleton, is a local > static of non-POD type. The C++ standard says: > > > > An implementation is permitted to perform > > early initialization of other local objects with static storage duration > under the same conditions that an > > implementation is permitted to statically initialize an object with static > storage duration in namespace scope > > (3.6.2). Otherwise such an object is initialized the first time control > passes through its declaration; such an > > object is considered initialized upon the completion of its initialization. > > > > I don’t think the language standard addresses what “first time control > passes through its declaration” means when two threads call the function > simultaneously. Perhaps gcc4 provides features that make that safe. I > don’t know if that is something that can be relied on in all compilers, > however. > > > > Ray > > > -------------- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 5:08 PM > > > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > I suppose you could also temporarily edit the header file. > > On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond > wrote: > > I’m trying to, without success. Breakpoints in header files, at least > with the version of tools I have, don’t work very well. > > > -- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 5:02 PM > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > Run in a debugger and set a breakpoint at wire_format_inl.h:289. > > On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond > wrote: > > I think I have an error in my code (C++) that only occurs when I have > multiple threads, and a lot of message volume. Even then, I can run the > same test many times, but only get a failure on some runs. With 7 threads > running on a 4 core machine, and generating 480384 google protocol buffer > messages, I get 33 errors like this to stdout: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] > Encountered string containing invalid UTF-8 data while serializing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > I believe that the data is in error since I get similar errors decoding the > messages: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] > Encountered string containing invalid UTF-8 data while parsing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > Is there any way that I can check for this at run time so that I can print > out more context? I do call IsInitialized before serializing, but that > doesn’t check for this case. > > > > I am running on SLES9SP4, using gcc 3.3.3 as the compiler. > > > > Ray > > > -- > > Ray Rizzuto > > raymond.rizz...@sig.com > > Susquehanna International Group > > (61
RE: intermittent issue with encode (version 2.0.3)
Is there something I need to do in the application to cause that to happen before I've spun off multiple threads? Currently I spin up threads, which then start making calls to fill in and then encode google protobuffers. From: Kenton Varda [mailto:ken...@google.com] Sent: Thursday, July 09, 2009 6:05 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com Subject: Re: intermittent issue with encode (version 2.0.3) As the comment says, the first call will always occur at startup time when there is only one thread anyway, so it's perfectly safe. The parenthetical about GCC4 is just an aside. On Thu, Jul 9, 2009 at 2:47 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I am a bit nervous about the GCC4 comment in GeneratedMessageFactory::singleton (message.cc): // No need for thread-safety here because this will be called at static // initialization time. (And GCC4 makes this thread-safe anyway.) I'm using gcc 3.3.3. The singleton object in GeneratedMessageFactory::singleton, is a local static of non-POD type. The C++ standard says: An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization. I don't think the language standard addresses what "first time control passes through its declaration" means when two threads call the function simultaneously. Perhaps gcc4 provides features that make that safe. I don't know if that is something that can be relied on in all compilers, however. Ray From: Kenton Varda [mailto:ken...@google.com<mailto:ken...@google.com>] Sent: Thursday, July 09, 2009 5:08 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com<mailto:protobuf@googlegroups.com> Subject: Re: intermittent issue with encode (version 2.0.3) I suppose you could also temporarily edit the header file. On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I'm trying to, without success. Breakpoints in header files, at least with the version of tools I have, don't work very well. From: Kenton Varda [mailto:ken...@google.com<mailto:ken...@google.com>] Sent: Thursday, July 09, 2009 5:02 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com<mailto:protobuf@googlegroups.com> Subject: Re: intermittent issue with encode (version 2.0.3) Run in a debugger and set a breakpoint at wire_format_inl.h:289. On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I think I have an error in my code (C++) that only occurs when I have multiple threads, and a lot of message volume. Even then, I can run the same test many times, but only get a failure on some runs. With 7 threads running on a 4 core machine, and generating 480384 google protocol buffer messages, I get 33 errors like this to stdout: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] Encountered string containing invalid UTF-8 data while serializing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. I believe that the data is in error since I get similar errors decoding the messages: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. Is there any way that I can check for this at run time so that I can print out more context? I do call IsInitialized before serializing, but that doesn't check for this case. I am running on SLES9SP4, using gcc 3.3.3 as the compiler. Ray Ray Rizzuto raymond.rizz...@sig.com<mailto:raymond.rizz...@sig.com> Susquehanna International Group (610)747-2336 (W) (215)776-3780 (C) IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recomme
Re: intermittent issue with encode (version 2.0.3)
As the comment says, the first call will always occur at startup time when there is only one thread anyway, so it's perfectly safe. The parenthetical about GCC4 is just an aside. On Thu, Jul 9, 2009 at 2:47 PM, Rizzuto, Raymond wrote: > I am a bit nervous about the GCC4 comment in > GeneratedMessageFactory::singleton (message.cc): > > > > // No need for thread-safety here because this will be called at static > > // initialization time. (And GCC4 makes this thread-safe anyway.) > > > > I’m using gcc 3.3.3. > > > > The singleton object in GeneratedMessageFactory::singleton, is a local > static of non-POD type. The C++ standard says: > > > > An implementation is permitted to perform > > early initialization of other local objects with static storage duration > under the same conditions that an > > implementation is permitted to statically initialize an object with static > storage duration in namespace scope > > (3.6.2). Otherwise such an object is initialized the first time control > passes through its declaration; such an > > object is considered initialized upon the completion of its initialization. > > > > I don’t think the language standard addresses what “first time control > passes through its declaration” means when two threads call the function > simultaneously. Perhaps gcc4 provides features that make that safe. I > don’t know if that is something that can be relied on in all compilers, > however. > > > > Ray > > > -- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 5:08 PM > > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > I suppose you could also temporarily edit the header file. > > On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond > wrote: > > I’m trying to, without success. Breakpoints in header files, at least > with the version of tools I have, don’t work very well. > > > -------------- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 5:02 PM > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > Run in a debugger and set a breakpoint at wire_format_inl.h:289. > > On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond > wrote: > > I think I have an error in my code (C++) that only occurs when I have > multiple threads, and a lot of message volume. Even then, I can run the > same test many times, but only get a failure on some runs. With 7 threads > running on a 4 core machine, and generating 480384 google protocol buffer > messages, I get 33 errors like this to stdout: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] > Encountered string containing invalid UTF-8 data while serializing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > I believe that the data is in error since I get similar errors decoding the > messages: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] > Encountered string containing invalid UTF-8 data while parsing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > Is there any way that I can check for this at run time so that I can print > out more context? I do call IsInitialized before serializing, but that > doesn’t check for this case. > > > > I am running on SLES9SP4, using gcc 3.3.3 as the compiler. > > > > Ray > > > -- > > Ray Rizzuto > > raymond.rizz...@sig.com > > Susquehanna International Group > > (610)747-2336 (W) > > (215)776-3780 (C) > > > > > > > -- > > IMPORTANT: The information contained in this email and/or its attachments > is confidential. If you are not the intended recipient, please notify the > sender immediately by reply and immediately delete this message and all its > attachments. Any review, use, reproduction, disclosure or dissemination of > this message or any attachment by an unintended recipient is strictly > prohibited. Neither this message nor any attachment is intended as or should > be construed as an offer, solicitation or recommendation to buy or sell any > security or other financial instrument. Neither the sender, his or her > employer nor any of their respective affiliate
RE: intermittent issue with encode (version 2.0.3)
I am a bit nervous about the GCC4 comment in GeneratedMessageFactory::singleton (message.cc): // No need for thread-safety here because this will be called at static // initialization time. (And GCC4 makes this thread-safe anyway.) I'm using gcc 3.3.3. The singleton object in GeneratedMessageFactory::singleton, is a local static of non-POD type. The C++ standard says: An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization. I don't think the language standard addresses what "first time control passes through its declaration" means when two threads call the function simultaneously. Perhaps gcc4 provides features that make that safe. I don't know if that is something that can be relied on in all compilers, however. Ray From: Kenton Varda [mailto:ken...@google.com] Sent: Thursday, July 09, 2009 5:08 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com Subject: Re: intermittent issue with encode (version 2.0.3) I suppose you could also temporarily edit the header file. On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I'm trying to, without success. Breakpoints in header files, at least with the version of tools I have, don't work very well. From: Kenton Varda [mailto:ken...@google.com<mailto:ken...@google.com>] Sent: Thursday, July 09, 2009 5:02 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com<mailto:protobuf@googlegroups.com> Subject: Re: intermittent issue with encode (version 2.0.3) Run in a debugger and set a breakpoint at wire_format_inl.h:289. On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I think I have an error in my code (C++) that only occurs when I have multiple threads, and a lot of message volume. Even then, I can run the same test many times, but only get a failure on some runs. With 7 threads running on a 4 core machine, and generating 480384 google protocol buffer messages, I get 33 errors like this to stdout: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] Encountered string containing invalid UTF-8 data while serializing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. I believe that the data is in error since I get similar errors decoding the messages: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. Is there any way that I can check for this at run time so that I can print out more context? I do call IsInitialized before serializing, but that doesn't check for this case. I am running on SLES9SP4, using gcc 3.3.3 as the compiler. Ray Ray Rizzuto raymond.rizz...@sig.com<mailto:raymond.rizz...@sig.com> Susquehanna International Group (610)747-2336 (W) (215)776-3780 (C) IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses. IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation
RE: intermittent issue with encode (version 2.0.3)
Exactly. Basically I want to perform any "standard" validation on content across all fields of the message, not application level validation. Since I have a number of complicated messages in my application, it would be somewhat tedious to write that code. If the compiler can add it automatically, or if there was a way of iterating/introspecting the object to validate it, that would be great. From: Kenton Varda [mailto:ken...@google.com] Sent: Thursday, July 09, 2009 5:34 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com Subject: Re: intermittent issue with encode (version 2.0.3) Sorry, I think I misread your message. You just want there to me a method like IsInitialized() that you can call to validate UTF-8 stuff. I'll think about that. On Thu, Jul 9, 2009 at 2:32 PM, Kenton Varda mailto:ken...@google.com>> wrote: This is something you can do in your own code -- just call your validation function before serializing. If this were to be a "feature" of protocol buffers, then we'd have to store a pointer to your validator function somewhere. Storing it in the message object itself would harm performance and memory usage, but storing it in a static location (such that it applies to all instances of the type) would bring all the myriad problems commonly associated with singletons. So I don't think there's any reasonable way for the protobuf system to provide this. On Thu, Jul 9, 2009 at 2:14 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I'm going to try that. Since another group builds and packages the libraries I use, it'll take a bit to make a private copy with that change. As an enhancement request, I wish there was a function I could call to validate the message content before serialize, that would tell me about any fields of the message that are in error. I.e. so I could catch that issue similarly to catching uninitialized fields: if (!m.IsInitialized()) { std::string error = name + " is missing fields: "; std::vector errors; m.FindInitializationErrors(&errors); std::vector::const_iterator it; for(it = errors.begin(); it!= errors.end(); ++it) { if (it != errors.begin()) error += ", "; error += *it; } throw SPException(error.c_str()); } It might not be something I'd do in production, but it sure would help during development. From: Kenton Varda [mailto:ken...@google.com<mailto:ken...@google.com>] Sent: Thursday, July 09, 2009 5:08 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com<mailto:protobuf@googlegroups.com> Subject: Re: intermittent issue with encode (version 2.0.3) I suppose you could also temporarily edit the header file. On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I'm trying to, without success. Breakpoints in header files, at least with the version of tools I have, don't work very well. From: Kenton Varda [mailto:ken...@google.com<mailto:ken...@google.com>] Sent: Thursday, July 09, 2009 5:02 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com<mailto:protobuf@googlegroups.com> Subject: Re: intermittent issue with encode (version 2.0.3) Run in a debugger and set a breakpoint at wire_format_inl.h:289. On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I think I have an error in my code (C++) that only occurs when I have multiple threads, and a lot of message volume. Even then, I can run the same test many times, but only get a failure on some runs. With 7 threads running on a 4 core machine, and generating 480384 google protocol buffer messages, I get 33 errors like this to stdout: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] Encountered string containing invalid UTF-8 data while serializing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. I believe that the data is in error since I get similar errors decoding the messages: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. Is there any way that I can check for this at run time so that I can print out more context? I do call IsInitialized before serializing, but that doesn't check for this case. I am running on SLES9SP4, using gcc 3.3.3 as the compiler. Ray __
Re: intermittent issue with encode (version 2.0.3)
Sorry, I think I misread your message. You just want there to me a method like IsInitialized() that you can call to validate UTF-8 stuff. I'll think about that. On Thu, Jul 9, 2009 at 2:32 PM, Kenton Varda wrote: > This is something you can do in your own code -- just call your validation > function before serializing. If this were to be a "feature" of protocol > buffers, then we'd have to store a pointer to your validator function > somewhere. Storing it in the message object itself would harm performance > and memory usage, but storing it in a static location (such that it applies > to all instances of the type) would bring all the myriad problems commonly > associated with singletons. So I don't think there's any reasonable way for > the protobuf system to provide this. > > > On Thu, Jul 9, 2009 at 2:14 PM, Rizzuto, Raymond > wrote: > >> I’m going to try that. Since another group builds and packages the >> libraries I use, it’ll take a bit to make a private copy with that change. >> >> >> >> As an enhancement request, I wish there was a function I could call to >> validate the message content before serialize, that would tell me about any >> fields of the message that are in error. I.e. so I could catch that issue >> similarly to catching uninitialized fields: >> >> >> >> if (!m.IsInitialized()) >> >> { >> >> std::string error = name + " is missing fields: "; >> >> std::vector errors; >> >> m.FindInitializationErrors(&errors); >> >> std::vector::const_iterator it; >> >> for(it = errors.begin(); it!= errors.end(); ++it) >> >> { >> >> if (it != errors.begin()) >> >> error += ", "; >> >> error += *it; >> >> } >> >> throw SPException(error.c_str()); >> >> } >> >> >> >> It might not be something I’d do in production, but it sure would help >> during development. >> >> >> -- >> >> *From:* Kenton Varda [mailto:ken...@google.com] >> *Sent:* Thursday, July 09, 2009 5:08 PM >> >> *To:* Rizzuto, Raymond >> *Cc:* protobuf@googlegroups.com >> *Subject:* Re: intermittent issue with encode (version 2.0.3) >> >> >> >> I suppose you could also temporarily edit the header file. >> >> On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond >> wrote: >> >> I’m trying to, without success. Breakpoints in header files, at least >> with the version of tools I have, don’t work very well. >> >> >> -- >> >> *From:* Kenton Varda [mailto:ken...@google.com] >> *Sent:* Thursday, July 09, 2009 5:02 PM >> *To:* Rizzuto, Raymond >> *Cc:* protobuf@googlegroups.com >> *Subject:* Re: intermittent issue with encode (version 2.0.3) >> >> >> >> Run in a debugger and set a breakpoint at wire_format_inl.h:289. >> >> On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond >> wrote: >> >> I think I have an error in my code (C++) that only occurs when I have >> multiple threads, and a lot of message volume. Even then, I can run the >> same test many times, but only get a failure on some runs. With 7 threads >> running on a 4 core machine, and generating 480384 google protocol buffer >> messages, I get 33 errors like this to stdout: >> >> >> >> libprotobuf ERROR >> /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] >> Encountered string containing invalid UTF-8 data while serializing protocol >> buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. >> >> >> >> I believe that the data is in error since I get similar errors decoding >> the messages: >> >> >> >> libprotobuf ERROR >> /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] >> Encountered string containing invalid UTF-8 data while parsing protocol >> buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. >> >> >> >> Is there any way that I can check for this at run time so that I can print >> out more context? I do call IsInitialized before serializing, but that >> doesn’t check for this case. >> >> >> >> I am runnin
Re: intermittent issue with encode (version 2.0.3)
This is something you can do in your own code -- just call your validation function before serializing. If this were to be a "feature" of protocol buffers, then we'd have to store a pointer to your validator function somewhere. Storing it in the message object itself would harm performance and memory usage, but storing it in a static location (such that it applies to all instances of the type) would bring all the myriad problems commonly associated with singletons. So I don't think there's any reasonable way for the protobuf system to provide this. On Thu, Jul 9, 2009 at 2:14 PM, Rizzuto, Raymond wrote: > I’m going to try that. Since another group builds and packages the > libraries I use, it’ll take a bit to make a private copy with that change. > > > > As an enhancement request, I wish there was a function I could call to > validate the message content before serialize, that would tell me about any > fields of the message that are in error. I.e. so I could catch that issue > similarly to catching uninitialized fields: > > > > if (!m.IsInitialized()) > > { > > std::string error = name + " is missing fields: "; > > std::vector errors; > > m.FindInitializationErrors(&errors); > > std::vector::const_iterator it; > > for(it = errors.begin(); it!= errors.end(); ++it) > > { > > if (it != errors.begin()) > > error += ", "; > > error += *it; > > } > > throw SPException(error.c_str()); > > } > > > > It might not be something I’d do in production, but it sure would help > during development. > > > -- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 5:08 PM > > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > I suppose you could also temporarily edit the header file. > > On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond > wrote: > > I’m trying to, without success. Breakpoints in header files, at least > with the version of tools I have, don’t work very well. > > > -- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 5:02 PM > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > Run in a debugger and set a breakpoint at wire_format_inl.h:289. > > On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond > wrote: > > I think I have an error in my code (C++) that only occurs when I have > multiple threads, and a lot of message volume. Even then, I can run the > same test many times, but only get a failure on some runs. With 7 threads > running on a 4 core machine, and generating 480384 google protocol buffer > messages, I get 33 errors like this to stdout: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] > Encountered string containing invalid UTF-8 data while serializing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > I believe that the data is in error since I get similar errors decoding the > messages: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] > Encountered string containing invalid UTF-8 data while parsing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > Is there any way that I can check for this at run time so that I can print > out more context? I do call IsInitialized before serializing, but that > doesn’t check for this case. > > > > I am running on SLES9SP4, using gcc 3.3.3 as the compiler. > > > > Ray > > > -- > > Ray Rizzuto > > raymond.rizz...@sig.com > > Susquehanna International Group > > (610)747-2336 (W) > > (215)776-3780 (C) > > > > > > > -- > > IMPORTANT: The information contained in this email and/or its attachments > is confidential. If you are not the intended recipient, please notify the > sender immediately by reply and immediately delete this message and all its > attachments. Any review, use, reproduction, disclosure or dissemination of > this message or any attachment by an unintended recipient is strictly > prohibite
RE: intermittent issue with encode (version 2.0.3)
I'm going to try that. Since another group builds and packages the libraries I use, it'll take a bit to make a private copy with that change. As an enhancement request, I wish there was a function I could call to validate the message content before serialize, that would tell me about any fields of the message that are in error. I.e. so I could catch that issue similarly to catching uninitialized fields: if (!m.IsInitialized()) { std::string error = name + " is missing fields: "; std::vector errors; m.FindInitializationErrors(&errors); std::vector::const_iterator it; for(it = errors.begin(); it!= errors.end(); ++it) { if (it != errors.begin()) error += ", "; error += *it; } throw SPException(error.c_str()); } It might not be something I'd do in production, but it sure would help during development. From: Kenton Varda [mailto:ken...@google.com] Sent: Thursday, July 09, 2009 5:08 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com Subject: Re: intermittent issue with encode (version 2.0.3) I suppose you could also temporarily edit the header file. On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I'm trying to, without success. Breakpoints in header files, at least with the version of tools I have, don't work very well. From: Kenton Varda [mailto:ken...@google.com<mailto:ken...@google.com>] Sent: Thursday, July 09, 2009 5:02 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com<mailto:protobuf@googlegroups.com> Subject: Re: intermittent issue with encode (version 2.0.3) Run in a debugger and set a breakpoint at wire_format_inl.h:289. On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I think I have an error in my code (C++) that only occurs when I have multiple threads, and a lot of message volume. Even then, I can run the same test many times, but only get a failure on some runs. With 7 threads running on a 4 core machine, and generating 480384 google protocol buffer messages, I get 33 errors like this to stdout: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] Encountered string containing invalid UTF-8 data while serializing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. I believe that the data is in error since I get similar errors decoding the messages: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. Is there any way that I can check for this at run time so that I can print out more context? I do call IsInitialized before serializing, but that doesn't check for this case. I am running on SLES9SP4, using gcc 3.3.3 as the compiler. Ray Ray Rizzuto raymond.rizz...@sig.com<mailto:raymond.rizz...@sig.com> Susquehanna International Group (610)747-2336 (W) (215)776-3780 (C) IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses. IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor an
Re: intermittent issue with encode (version 2.0.3)
I suppose you could also temporarily edit the header file. On Thu, Jul 9, 2009 at 2:05 PM, Rizzuto, Raymond wrote: > I’m trying to, without success. Breakpoints in header files, at least > with the version of tools I have, don’t work very well. > > > -- > > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Thursday, July 09, 2009 5:02 PM > *To:* Rizzuto, Raymond > *Cc:* protobuf@googlegroups.com > *Subject:* Re: intermittent issue with encode (version 2.0.3) > > > > Run in a debugger and set a breakpoint at wire_format_inl.h:289. > > On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond > wrote: > > I think I have an error in my code (C++) that only occurs when I have > multiple threads, and a lot of message volume. Even then, I can run the > same test many times, but only get a failure on some runs. With 7 threads > running on a 4 core machine, and generating 480384 google protocol buffer > messages, I get 33 errors like this to stdout: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] > Encountered string containing invalid UTF-8 data while serializing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > I believe that the data is in error since I get similar errors decoding the > messages: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] > Encountered string containing invalid UTF-8 data while parsing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > Is there any way that I can check for this at run time so that I can print > out more context? I do call IsInitialized before serializing, but that > doesn’t check for this case. > > > > I am running on SLES9SP4, using gcc 3.3.3 as the compiler. > > > > Ray > > > -- > > Ray Rizzuto > > raymond.rizz...@sig.com > > Susquehanna International Group > > (610)747-2336 (W) > > (215)776-3780 (C) > > > > > > > -- > > IMPORTANT: The information contained in this email and/or its attachments > is confidential. If you are not the intended recipient, please notify the > sender immediately by reply and immediately delete this message and all its > attachments. Any review, use, reproduction, disclosure or dissemination of > this message or any attachment by an unintended recipient is strictly > prohibited. Neither this message nor any attachment is intended as or should > be construed as an offer, solicitation or recommendation to buy or sell any > security or other financial instrument. Neither the sender, his or her > employer nor any of their respective affiliates makes any warranties as to > the completeness or accuracy of any of the information contained herein or > that this message or any of its attachments is free of viruses. > > > > > > > > -- > IMPORTANT: The information contained in this email and/or its attachments > is confidential. If you are not the intended recipient, please notify the > sender immediately by reply and immediately delete this message and all its > attachments. Any review, use, reproduction, disclosure or dissemination of > this message or any attachment by an unintended recipient is strictly > prohibited. Neither this message nor any attachment is intended as or should > be construed as an offer, solicitation or recommendation to buy or sell any > security or other financial instrument. Neither the sender, his or her > employer nor any of their respective affiliates makes any warranties as to > the completeness or accuracy of any of the information contained herein or > that this message or any of its attachments is free of viruses. > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
RE: intermittent issue with encode (version 2.0.3)
I'm trying to, without success. Breakpoints in header files, at least with the version of tools I have, don't work very well. From: Kenton Varda [mailto:ken...@google.com] Sent: Thursday, July 09, 2009 5:02 PM To: Rizzuto, Raymond Cc: protobuf@googlegroups.com Subject: Re: intermittent issue with encode (version 2.0.3) Run in a debugger and set a breakpoint at wire_format_inl.h:289. On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond mailto:raymond.rizz...@sig.com>> wrote: I think I have an error in my code (C++) that only occurs when I have multiple threads, and a lot of message volume. Even then, I can run the same test many times, but only get a failure on some runs. With 7 threads running on a 4 core machine, and generating 480384 google protocol buffer messages, I get 33 errors like this to stdout: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] Encountered string containing invalid UTF-8 data while serializing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. I believe that the data is in error since I get similar errors decoding the messages: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. Is there any way that I can check for this at run time so that I can print out more context? I do call IsInitialized before serializing, but that doesn't check for this case. I am running on SLES9SP4, using gcc 3.3.3 as the compiler. Ray Ray Rizzuto raymond.rizz...@sig.com<mailto:raymond.rizz...@sig.com> Susquehanna International Group (610)747-2336 (W) (215)776-3780 (C) IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses. IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: intermittent issue with encode (version 2.0.3)
Run in a debugger and set a breakpoint at wire_format_inl.h:289. On Thu, Jul 9, 2009 at 1:56 PM, Rizzuto, Raymond wrote: > I think I have an error in my code (C++) that only occurs when I have > multiple threads, and a lot of message volume. Even then, I can run the > same test many times, but only get a failure on some runs. With 7 threads > running on a 4 core machine, and generating 480384 google protocol buffer > messages, I get 33 errors like this to stdout: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] > Encountered string containing invalid UTF-8 data while serializing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > I believe that the data is in error since I get similar errors decoding the > messages: > > > > libprotobuf ERROR > /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] > Encountered string containing invalid UTF-8 data while parsing protocol > buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. > > > > Is there any way that I can check for this at run time so that I can print > out more context? I do call IsInitialized before serializing, but that > doesn’t check for this case. > > > > I am running on SLES9SP4, using gcc 3.3.3 as the compiler. > > > > Ray > > > -- > > Ray Rizzuto > > raymond.rizz...@sig.com > > Susquehanna International Group > > (610)747-2336 (W) > > (215)776-3780 (C) > > > > > > -- > IMPORTANT: The information contained in this email and/or its attachments > is confidential. If you are not the intended recipient, please notify the > sender immediately by reply and immediately delete this message and all its > attachments. Any review, use, reproduction, disclosure or dissemination of > this message or any attachment by an unintended recipient is strictly > prohibited. Neither this message nor any attachment is intended as or should > be construed as an offer, solicitation or recommendation to buy or sell any > security or other financial instrument. Neither the sender, his or her > employer nor any of their respective affiliates makes any warranties as to > the completeness or accuracy of any of the information contained herein or > that this message or any of its attachments is free of viruses. > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
intermittent issue with encode (version 2.0.3)
I think I have an error in my code (C++) that only occurs when I have multiple threads, and a lot of message volume. Even then, I can run the same test many times, but only get a failure on some runs. With 7 threads running on a 4 core machine, and generating 480384 google protocol buffer messages, I get 33 errors like this to stdout: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:289] Encountered string containing invalid UTF-8 data while serializing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. I believe that the data is in error since I get similar errors decoding the messages: libprotobuf ERROR /siglinux/tc/sles9sp4_gcc-3.3.3_i686/sig1/protobuf-2.0.3/include/google/protobuf/wire_format_inl.h:138] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes. Is there any way that I can check for this at run time so that I can print out more context? I do call IsInitialized before serializing, but that doesn't check for this case. I am running on SLES9SP4, using gcc 3.3.3 as the compiler. Ray Ray Rizzuto raymond.rizz...@sig.com Susquehanna International Group (610)747-2336 (W) (215)776-3780 (C) IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---