[protobuf] Re: Install python protobuf in user folder
Never mind, I had to select the MacPorts python for this to work. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To view this discussion on the web visit https://groups.google.com/d/msg/protobuf/-/VVJ0bHA7IsQJ. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Install python protobuf in user folder
Hi, I have OS X Lion and would like to install the protobuf 2.4.1 python libs in a user folder. The C++ part was successfully installed with macports. However, port protobuf-python27 does not seem to do any effect. I've installed it, but can not find any installed files. I've also tried to follow the easy_install instructions but can not even install the easy_path system. Is there an easy out-of-the-box solution, similar to C++ protobuf but for python? thanks. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To view this discussion on the web visit https://groups.google.com/d/msg/protobuf/-/pxMZxlaw3C8J. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Copy nested repeated messages
Oh, I forgot to add some info on the Protobuf. I use v2.3.0 with C++. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To view this discussion on the web visit https://groups.google.com/d/msg/protobuf/-/Zk5LNGpDN1R2Q3NK. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Copy nested repeated messages
Hi, I've got nested messages like: message A { required double value = 1; } message B { required A a = 1; } message C { repeated B entries = 1; } The C object is saved in file and may have any number of B entries. However, now I'd like to save a copy of C in a different file with B's that match some criteria, e.g. ... C original; // Read object from file C *copy = new C(); typedef ::google::protobuf::RepeatedPtrField Bs; for(Bs::const_iterator b = c->entries().begin(); c->entries().end() != b; ++b) { if (b->a().value() < 5) continue; // Copy B entry and add it to the Filtered C B *entry = copy->add_entries(); *entry = *b; } // save copy Sometimes it happens that I get the error message upon reading filtered C: libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "C" because it is missing required fields: entries[0].A It seems that deep copy of nested repeated messages failed for some reason. Any ideas how to fix this? -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To view this discussion on the web visit https://groups.google.com/d/msg/protobuf/-/WmZZUDM2ZDR0ZmNK. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] ProtoBuf 2.4.0(a)
Hi, I am a bit confused: is final version v2.4.0 of ProtoBuf released? I've heard from multiple sources that it is even though the official project web-page has a link to v2.4.0a . Is it "alpha" version? What does "a" mean? Is it stable release? Thanks. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Inheritance..
Hi, I guess, ProtoBuf was made for use as a very simple data container from the very beginning. User (programmer) is supposed to write wrappers around these containers. AFAIK, there is no access level control, all set/get methods are public. Don't forget, that ProtoBuf is only simple way to (re-)store data. It seems, that you are trying to have a very generic use-case: Automatic serialization/deserialization of complex structures with inheritance. The next logical question would be access level, etc. All that would complicate things and is not what ProtoBuf is made for. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] ProtoBuf and Multi-Threads
I like the interest in the topic. I've put 1GB to emphasize that the use case is safe. In fact, I save messages in file in next way: XYXYXYXYXY. where X is the size of the message and Y is the message itself. Each message is read in the loop and overwritten. Clearly, I do *not* read the whole file (N GB's) into memory at once. Now, with this technique, I can generate files with size larger than 2^31 (~ 2GB). The file is successfully written. Consider the case with 5 GB file. Unfortunately, whenever I start reading this 5 GB's file, ProtoBuf fails after 2^31 bytes are read. Of course, I have to push the limit of read bytes with: CodedInputStream::SetTotalBytesLimit(int, int) Pay attention at the arguments type: *int* . I suppose ProtoBuf uses bytes read counter or some internal file read position pointer that is also *int*and therefore fails whenever reading procedure passes the 2^31 threshold. Thanks for the link to perftools. Like you mentioned, I reuse the message in my code. Therefore there is no overhead. I guess, the problem was in the way I measured execution time. My command looked like: time executable args && echo "-" && time executable args So, I've cut it into 3 pieces and time, that is shown on the screen, start make sense: time executable args echo -- time executable args -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] ProtoBuf and Multi-Threads
btw, ProtoBuf is really fast and easy to use. I like it. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] ProtoBuf and Multi-Threads
I've added the synched cout wrapper and fixed C "float" function use. Eventually code started working as expected, for example, in cast of 8 cores computer the performance measurements are: Generate 20 files with 10 events in each WRITING === Generate ProtoBuf real 0m15.608s user 0m6.582s sys 0m1.383s READING === Read ProtoBuf Processed events: 200 real 0m6.992s user 0m6.393s sys 0m0.534s READING (MULTITHREADS) = 8: data_1.pb 7: data_10.pb 6: data_11.pb 5: data_12.pb 4: data_13.pb 3: data_14.pb 2: data_15.pb 1: data_16.pb 7: data_17.pb 5: data_18.pb 4: data_19.pb 6: data_2.pb 8: data_20.pb 1: data_3.pb 3: data_4.pb 5: data_5.pb 7: data_6.pb 6: data_7.pb 8: data_8.pb 1: data_9.pb Thread read 20 events Thread read 10 events Thread read 20 events Thread read 30 events Thread read 30 events Thread read 30 events Thread read 30 events Thread read 30 events real 0m1.527s user 0m7.877s sys 0m0.432s So, reading is ~4.66 times faster in the multi-threads case. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] ProtoBuf and Multi-Threads
Thanks for a quick reply. Honestly, I fill a set of histograms for each event. I've added this feature only recently and have a version of the code without histograms. Here is the same performance measurement without histograms: READING === Read ProtoBuf Processed events: 100 real 0m2.510s user 0m2.105s sys 0m0.298s ---===--- READING (MULTITHREADS) = process files init threads start threads run threads Thread read 100 events real 0m2.358s user 0m2.085s sys 0m0.236s Again, the same situation. My file is 384MB. I've already tested the use case with files above 1GB. It turs out that ProtoBuf has a "int" limitation on file size. Anyway, I am a way below the limit. The messages are pretty short (~400B). -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] ProtoBuf and Multi-Threads
Hi, I have a large set of files with a number of the same type messages saved. My code reads messages in a sequence from these files one after another. I've measured time (with terminal "time" command) of running the code, and get something like: READING === Read ProtoBuf Processed events: 5000 real 7m2.146s user 5m25.545s sys 0m31.959s Then I've adjusted the code to read the files in threads (8 threads on 8 cores machine). The reading procedure is independent and put into separate class. Therefore each thread is really independent of the others. Nevertheless, the time measurement is: READING (MULTITHREADS) = Thread read 600 events Thread read 600 events Thread read 600 events Thread read 600 events Thread read 600 events Thread read 600 events Thread read 700 events Thread read 700 events real 5m3.808s user 5m42.301s sys 0m35.221s As you may see, the "user" as well as "real" time is pretty much the same. So, it seems that there is some internal locks done somewhere. I only use locks between threads and class, that creates and manages threads. The locks are used only when thread finishes reading the file(s). Does ProtoBuf use some sort of generic static/singleton functions/objects that are used to de-serialize messages and therefore lock when accessed form different threads? If so, is there a way to suppress this and get truly independent messages parsing? thanks. P.S. My code can be browsed on github: http://goo.gl/DXCCF . The reading of messages is done by: http://goo.gl/OsHV9 The code uses ROOT framework (root.cern.ch) if one wants to compile it. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] GZip Stream examples
Well, I have to update the very first value, , after all messages are written. I do not know a priory how many messages will be stored. Therefore, I use fstream::fseekp(0) to move the write pointer before the file is closed and update the value. Of course, the number is written without optimizations with WriteLittleEndian32(...). It does not seem I can do the same with Gzip. The number would be compressed differently depending on its value and therefore may have different length. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] GZip Stream examples
Cool, it worked great. Can I mix Raw out and Gzip out in the file? Say, I'd like to write a raw number (4 bytes) at the beginning of the file and then add the message through the Gzip stream. Visually, my file would look like: . where first - 4 bytes written with raw_out and the rest: GG - with Gzip Stream. Of course, the reading sequence would be: 1. read 2. keep reading the rest G through Gzip Stream. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] GZip Stream examples
Hi, Are there any examples on how to use GzipOUtputStream in ProtoBuf? I've manages so far combo: _raw_out.reset(new ::google::protobuf::io::OstreamOutputStream(&_output)); _coded_out.reset(new ::google::protobuf::io::CodedOutputStream(_raw_out.get())); (both objects are boost::shared_pointer's). How am I supposed to use the GzipOutputStream here? thanks. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] A protocol message was rejected because it was too big ???
Hmm, thanks for the advice. It may work fine. Nevertheless, I have to skip previously read messages in this case every time CodedInputStream is read. In fact, I faced different problem recently. It turns out I can write arbitrary long files, even 7GB. No problems. Unfortunately, reading does not work out after 2^31 bytes are read. Is there a way around? -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] A protocol message was rejected because it was too big ???
I think I found the source of the problem. The problem is that CodedInputStream has internal counter of how many bytes are read so far with the same object. In my case, there are a lot of small messages saved in the same file. I do not read them at once and therefore do not care about large messages, limits. I am safe. So, the problem can be easily solved by calling: CodedInputStream input_stream(...); input_stream.SetTotalBytesLimit(1e9, 9e8); My use-case is really about storing extremely large number (up to 1e9) of small messages ~ 10K each. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] A protocol message was rejected because it was too big ???
How come? I explicitly track the larges message written to the file with: http://goo.gl/SAKlU Here is an example of output I get: [1 ProtoBuf git.hist]$ ./bin/write data.pb && echo "---===---" && ./bin/read data.pb Saved: 100040 events Largest message size writte: 1815 bytes ---===--- File has: 100040 events libprotobuf WARNING google/protobuf/io/coded_stream.cc:478] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. Read: 86209 events Largest message read: 1815 bytes [1 ProtoBuf git.hist]$ As you may see the largest message is only 1815 bytes (!). But due to the above error I can not read the rest of the messages. It does not make sense. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] A protocol message was rejected because it was too big ???
Hi, I generate a huge number of the same messages and save them one by one in a file. Each message is generated and then saved on the fly. This way I do not keep in memory large array of messages, only one at a time. Everything works fine. The largest message written is about 2K (serialized string size). Then I read these messages one by one from the file and use. I keep only one message in memory at a time again. Everything works fine if I have, say ~10e4 messages. Once the number of saved messages is increased to something like 10e6 then I get warnings from ProtoBuf, like: libprotobuf WARNING google/protobuf/io/coded_stream.cc:478] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. and then: libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. What might be wrong? Here is my code (it is very short and simple): Message: http://goo.gl/mzmTB Write executable: http://goo.gl/SH41R Writer (Output Wrapper): http://goo.gl/Fr0Rf Read executable: http://goo.gl/UpC5i Reader (Input Wrapper): http://goo.gl/zAeuU The errors/warnings start if one changes 1e4 to 1e6 at: http://goo.gl/1IBZS Thanks. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Store number of messages
Hmm, it makes sense now and explains everything. Unfortunately, I didn't see the way to write fixed width number with CodedOutputStream. Is there a way to do this? -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Store number of messages
I have an application where number of messages is a priory unknown. I can open output file and write 0 in the beginning: Writer::Writer(const string &filename): _output(filename.c_str(), ios::out | ios::trunc | ios::binary), _events_written(0) { _raw_out.reset(new ::google::protobuf::io::OstreamOutputStream(&_output)); _coded_out.reset(new ::google::protobuf::io::CodedOutputStream(_raw_out.get())); _coded_out->WriteVarint32(_events_written); } and then in the destructor reset the position in the ofstream to the beginning to write the number of the events: Writer::~Writer() { _coded_out.reset(); _raw_out.reset(); _output.seekp(0); // Small trick to save number of the events // _raw_out.reset(new ::google::protobuf::io::OstreamOutputStream(&_output)); _coded_out.reset(new ::google::protobuf::io::CodedOutputStream(_raw_out.get())); _coded_out->WriteVarint32(_events_written); _coded_out.reset(); _raw_out.reset(); _output.close(); } Then read the beginning of the file with number of the events: Reader::Reader(const string &filename): _input(filename.c_str(), ios::in | ios::binary), _is_good(true), _events_written(0) { _raw_in.reset(new ::google::protobuf::io::IstreamInputStream(&_input)); _coded_in.reset(new ::google::protobuf::io::CodedInputStream(_raw_in.get())); _coded_in->ReadVarint32(&_events_written); } Everything seems fine. Then events can be read one by one like: bool Reader::read(Event &event) { event.Clear(); uint32_t message_size; if (!_coded_in->ReadVarint32(&message_size)) { _is_good = false; return false; } if (0 < message_size) { string message; if (!_coded_in->ReadString(&message, message_size) || !event.ParseFromString(message)) return false; } return true; } Unfortunately, ReadVarint32 fails for some reason in the Reader::read(...) method. The code works fine in case I do not use seekp in the Writer::~Writer(). What is the proper way to seek to the beginning of the file and store number of entries? -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Do Protocol Buffers read entire collection of messages into memory from file?
Hi, I am wondering how do Protocol Buffers read input files? Is the entire file read into memory or some proxy technique is used and entries are read only when required? This is a vital feature for large lists, say, some dataset with 10^9 messages. Do Protocol Buffers use any additional archiving technique (zip, tar, etc.) to further compress saved information? sincerely, Sam. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.