[protobuf] Re: Install python protobuf in user folder

2011-09-28 Thread ksamdev
Never mind, I had to select the MacPorts python for this to work.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/protobuf/-/VVJ0bHA7IsQJ.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Install python protobuf in user folder

2011-09-28 Thread ksamdev
Hi,

I have OS X Lion and would like to install the protobuf 2.4.1 python libs in 
a user folder.

The C++ part was successfully installed with macports. However, port 
protobuf-python27 does not seem to do any effect. I've installed it, but can 
not find any installed files.

I've also tried to follow the easy_install instructions but can not even 
install the easy_path system.

Is there an easy out-of-the-box solution, similar to C++ protobuf but for 
python?

thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/protobuf/-/pxMZxlaw3C8J.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Copy nested repeated messages

2011-06-07 Thread ksamdev
Oh, I forgot to add some info on the Protobuf. I use v2.3.0 with C++.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/protobuf/-/Zk5LNGpDN1R2Q3NK.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Copy nested repeated messages

2011-06-07 Thread ksamdev
Hi,

I've got nested messages like:

message A
{
   required double value = 1;
}

message B
{
  required A a = 1;
}

message C
{
  repeated B entries = 1;
}

The C object is saved in file and may have any number of B entries. However, 
now I'd like to save a copy of C in a different file with B's that match 
some criteria, e.g.

...
C original;
// Read object from file

C *copy = new C();
typedef ::google::protobuf::RepeatedPtrField Bs;

for(Bs::const_iterator b = c->entries().begin();
c->entries().end() != b;
++b)
{
if (b->a().value() < 5)
continue;

// Copy B entry and add it to the Filtered C
B *entry = copy->add_entries();
*entry = *b;
}
// save copy

Sometimes it happens that I get the error message upon reading filtered C:

libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message 
of type "C" because it is missing required fields: entries[0].A

It seems that deep copy of nested repeated messages failed for some reason. 
Any ideas how to fix this?

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/protobuf/-/WmZZUDM2ZDR0ZmNK.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] ProtoBuf 2.4.0(a)

2011-03-29 Thread ksamdev
Hi,

I am a bit confused: is final version v2.4.0 of ProtoBuf released? I've 
heard from multiple sources that it is even though the official project 
web-page has a link to v2.4.0a . Is it "alpha" version? What does "a" mean? 
Is it stable release?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Inheritance..

2011-03-21 Thread ksamdev
Hi,

I guess, ProtoBuf was made for use as a very simple data container from the 
very beginning. User (programmer) is supposed to write wrappers around these 
containers. AFAIK, there is no access level control, all set/get methods are 
public.

Don't forget, that ProtoBuf is only simple way to (re-)store data.

It seems, that you are trying to have a very generic use-case: Automatic 
serialization/deserialization of complex structures with inheritance. The 
next logical question would be access level, etc. All that would complicate 
things and is not what ProtoBuf is made for.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] ProtoBuf and Multi-Threads

2011-03-15 Thread ksamdev
I like the interest in the topic. 

I've put 1GB to emphasize that the use case is safe. In fact, I save 
messages in file in next way:

XYXYXYXYXY.

where X is the size of the message and Y is the message itself. Each message 
is read in the loop and overwritten. Clearly, I do *not* read the whole file 
(N GB's) into memory at once.

Now, with this technique, I can generate files with size larger than 2^31 (~ 
2GB). The file is successfully written. Consider the case with 5 GB file. 
Unfortunately, whenever I start reading this 5 GB's file, ProtoBuf fails 
after 2^31 bytes are read. Of course, I have to push the limit of read bytes 
with:

CodedInputStream::SetTotalBytesLimit(int, int)

Pay attention at the arguments type: *int* . I suppose ProtoBuf uses bytes 
read counter or some internal file read position pointer that is also *int*and 
therefore fails whenever reading procedure passes the 2^31 threshold.

Thanks for the link to perftools. Like you mentioned, I reuse the message in 
my code. Therefore there is no overhead.

I guess, the problem was in the way I measured execution time. My command 
looked like:

time executable args && echo "-" && time executable args

So, I've cut it into 3 pieces and time, that is shown on the screen, start 
make sense:

time executable args
echo --
time executable args 

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] ProtoBuf and Multi-Threads

2011-03-14 Thread ksamdev
btw, ProtoBuf is really fast and easy to use. I like it.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] ProtoBuf and Multi-Threads

2011-03-14 Thread ksamdev
I've added the synched cout wrapper and fixed C "float" function use. 
Eventually code started working as expected, for example, in cast of 8 cores 
computer the performance measurements are:

Generate 20 files with 10 events in each

WRITING
===

Generate ProtoBuf

real 0m15.608s
user 0m6.582s
sys 0m1.383s

READING
===

Read ProtoBuf
Processed events: 200

real 0m6.992s
user 0m6.393s
sys 0m0.534s

READING (MULTITHREADS)
=

8: data_1.pb
7: data_10.pb
6: data_11.pb
5: data_12.pb
4: data_13.pb
3: data_14.pb
2: data_15.pb
1: data_16.pb
7: data_17.pb
5: data_18.pb
4: data_19.pb
6: data_2.pb
8: data_20.pb
1: data_3.pb
3: data_4.pb
5: data_5.pb
7: data_6.pb
6: data_7.pb
8: data_8.pb
1: data_9.pb
Thread read 20 events
Thread read 10 events
Thread read 20 events
Thread read 30 events
Thread read 30 events
Thread read 30 events
Thread read 30 events
Thread read 30 events

real 0m1.527s
user 0m7.877s
sys 0m0.432s

So, reading is ~4.66 times faster in the multi-threads case.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] ProtoBuf and Multi-Threads

2011-03-14 Thread ksamdev
Thanks for a quick reply.

Honestly, I fill a set of histograms for each event. I've added this feature 
only recently and have a version of the code without histograms.

Here is the same performance measurement without histograms:

READING
===

Read ProtoBuf
Processed events: 100

real 0m2.510s
user 0m2.105s
sys 0m0.298s
---===---

READING (MULTITHREADS)
=

process files
init threads
start threads
run threads
Thread read 100 events

real 0m2.358s
user 0m2.085s
sys 0m0.236s

Again, the same situation.

My file is 384MB. I've already tested the use case with files above 1GB. It 
turs out that ProtoBuf has a "int" limitation on file size. Anyway, I am a 
way below the limit. The messages are pretty short (~400B).

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] ProtoBuf and Multi-Threads

2011-03-14 Thread ksamdev
Hi,

I have a large set of files with a number of the same type messages saved. 
My code reads messages in a sequence from these files one after another. 
I've measured time (with terminal "time" command) of running the code, and 
get something like:

READING
===

Read ProtoBuf
Processed events: 5000

real 7m2.146s
user 5m25.545s
sys 0m31.959s

Then I've adjusted the code to read the files in threads (8 threads on 8 
cores machine). The reading procedure is independent and put into separate 
class. Therefore each thread is really independent of the others. 
Nevertheless, the time measurement is:

READING (MULTITHREADS)
=

Thread read 600 events
Thread read 600 events
Thread read 600 events
Thread read 600 events
Thread read 600 events
Thread read 600 events
Thread read 700 events
Thread read 700 events

real 5m3.808s
user 5m42.301s
sys 0m35.221s

As you may see, the "user" as well as "real" time is pretty much the same.

So, it seems that there is some internal locks done somewhere. I only use 
locks between threads and class, that creates and manages threads. The locks 
are used only when thread finishes reading the file(s).

Does ProtoBuf use some sort of generic static/singleton functions/objects 
that are used to de-serialize messages and therefore lock when accessed form 
different threads? If so, is there a way to suppress this and get truly 
independent messages parsing?

thanks.

P.S. My code can be browsed on github: http://goo.gl/DXCCF . The reading of 
messages is done by: http://goo.gl/OsHV9
The code uses ROOT framework (root.cern.ch) if one wants to compile it.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] GZip Stream examples

2011-03-09 Thread ksamdev
Well, I have to update the very first value, , after all messages are 
written. I do not know a priory how many messages will be stored. Therefore, 
I use fstream::fseekp(0) to move the write pointer before the file is closed 
and update the value. Of course, the number is written without optimizations 
with WriteLittleEndian32(...).

It does not seem I can do the same with Gzip. The number would be compressed 
differently depending on its value and therefore may have different length.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] GZip Stream examples

2011-03-08 Thread ksamdev
Cool, it worked great.

Can I mix Raw out and Gzip out in the file?

Say, I'd like to write a raw number (4 bytes) at the beginning of the file 
and then add the message through the Gzip stream. Visually, my file would 
look like:

.

where first  - 4 bytes written with raw_out and the rest: 
GG - with Gzip Stream.

Of course, the reading sequence would be:

1. read 
2. keep reading the rest G through Gzip Stream.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] GZip Stream examples

2011-03-08 Thread ksamdev
Hi,

Are there any examples on how to use GzipOUtputStream in ProtoBuf?
I've manages so far combo:

 _raw_out.reset(new 
::google::protobuf::io::OstreamOutputStream(&_output));
 _coded_out.reset(new 
::google::protobuf::io::CodedOutputStream(_raw_out.get()));

(both objects are boost::shared_pointer's).

How am I supposed to use the GzipOutputStream here?

thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] A protocol message was rejected because it was too big ???

2011-03-07 Thread ksamdev
Hmm, thanks for the advice. It may work fine. Nevertheless, I have to skip 
previously read messages in this case every time CodedInputStream is read.

In fact, I faced different problem recently. It turns out I can write 
arbitrary long files, even 7GB. No problems.

Unfortunately, reading does not work out after 2^31 bytes are read. Is there 
a way around?

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] A protocol message was rejected because it was too big ???

2011-03-06 Thread ksamdev
I think I found the source of the problem. The problem is that 
CodedInputStream has internal counter of how many bytes are read so far with 
the same object.

In my case, there are a lot of small messages saved in the same file. I do 
not read them at once and therefore do not care about large messages, 
limits. I am safe.

So, the problem can be easily solved by calling:

CodedInputStream input_stream(...);
input_stream.SetTotalBytesLimit(1e9, 9e8);

My use-case is really about storing extremely large number (up to 1e9) of 
small messages ~ 10K each.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] A protocol message was rejected because it was too big ???

2011-03-06 Thread ksamdev
How come? I explicitly track the larges message written to the file 
with: http://goo.gl/SAKlU

Here is an example of output I get:

[1 ProtoBuf git.hist]$ ./bin/write data.pb && echo "---===---" && ./bin/read 
data.pb
Saved: 100040 events
Largest message size writte: 1815 bytes
---===---
File has: 100040 events
libprotobuf WARNING google/protobuf/io/coded_stream.cc:478] Reading 
dangerously large protocol message.  If the message turns out to be larger 
than 67108864 bytes, parsing will be halted for security reasons.  To 
increase the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message 
was rejected because it was too big (more than 67108864 bytes).  To increase 
the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message 
was rejected because it was too big (more than 67108864 bytes).  To increase 
the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
Read: 86209 events
Largest message read: 1815 bytes
[1 ProtoBuf git.hist]$

As you may see the largest message is only 1815 bytes (!). But due to the 
above error I can not read the rest of the messages.

It does not make sense.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] A protocol message was rejected because it was too big ???

2011-03-06 Thread ksamdev
Hi,

I generate a huge number of the same messages and save them one by one in a 
file. Each message is generated and then saved on the fly. This way I do not 
keep in memory large array of messages, only one at a time. Everything works 
fine. The largest message written is about 2K (serialized string size).

Then I read these messages one by one from the file and use. I keep only one 
message in memory at a time again. Everything works fine if I have, say 
~10e4 messages.

Once the number of saved messages is increased to something like 10e6 then I 
get warnings from ProtoBuf, like:

libprotobuf WARNING google/protobuf/io/coded_stream.cc:478] Reading 
dangerously large protocol message.  If the message turns out to be larger 
than 67108864 bytes, parsing will be halted for security reasons.  To 
increase the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

and then:

libprotobuf ERROR google/protobuf/io/coded_stream.cc:147] A protocol message 
was rejected because it was too big (more than 67108864 bytes).  To increase 
the limit (or to disable these warnings), see 
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

What might be wrong?


Here is my code (it is very short and simple):

Message: http://goo.gl/mzmTB

Write executable: http://goo.gl/SH41R
Writer (Output Wrapper): http://goo.gl/Fr0Rf

Read executable: http://goo.gl/UpC5i
Reader (Input Wrapper): http://goo.gl/zAeuU

The errors/warnings start if one changes 1e4 to 1e6 at: http://goo.gl/1IBZS

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Store number of messages

2011-03-05 Thread ksamdev
Hmm, it makes sense now and explains everything. Unfortunately, I didn't see 
the way to write fixed width number with CodedOutputStream. Is there a way 
to do this?

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Store number of messages

2011-03-04 Thread ksamdev
I have an application where number of messages is a priory unknown. I can 
open output file and write 0 in the beginning:

Writer::Writer(const string &filename):
_output(filename.c_str(), ios::out | ios::trunc | ios::binary),
_events_written(0)
{
_raw_out.reset(new 
::google::protobuf::io::OstreamOutputStream(&_output));
_coded_out.reset(new 
::google::protobuf::io::CodedOutputStream(_raw_out.get()));

_coded_out->WriteVarint32(_events_written);
}

and then in the destructor reset the position in the ofstream to the 
beginning to write the number of the events:

Writer::~Writer()
{
_coded_out.reset();
_raw_out.reset();

_output.seekp(0);

// Small trick to save number of the events
//
_raw_out.reset(new 
::google::protobuf::io::OstreamOutputStream(&_output));
_coded_out.reset(new 
::google::protobuf::io::CodedOutputStream(_raw_out.get()));

_coded_out->WriteVarint32(_events_written);

_coded_out.reset();
_raw_out.reset();

_output.close();
}

Then read the beginning of the file with number of the events:

Reader::Reader(const string &filename):
_input(filename.c_str(), ios::in | ios::binary),
_is_good(true),
_events_written(0)
{
_raw_in.reset(new ::google::protobuf::io::IstreamInputStream(&_input));
_coded_in.reset(new 
::google::protobuf::io::CodedInputStream(_raw_in.get()));

_coded_in->ReadVarint32(&_events_written);
}

Everything seems fine. Then events can be read one by one like:

bool Reader::read(Event &event)
{
event.Clear();

uint32_t message_size;
if (!_coded_in->ReadVarint32(&message_size))
{
_is_good = false;

return false;
}

if (0 < message_size)
{
string message;
if (!_coded_in->ReadString(&message, message_size) ||
!event.ParseFromString(message))

return false;
}

return true;
}

Unfortunately, ReadVarint32 fails for some reason in the Reader::read(...) 
method.

The code works fine in case I do not use seekp in the Writer::~Writer().

What is the proper way to seek to the beginning of the file and store number 
of entries?

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Do Protocol Buffers read entire collection of messages into memory from file?

2010-10-19 Thread ksamdev
Hi,

I am wondering how do Protocol Buffers read input files? Is the entire
file read into memory or some proxy technique is used and entries are
read only when required?

This is a vital feature for large lists, say, some dataset with 10^9
messages.

Do Protocol Buffers use any additional archiving technique (zip, tar,
etc.) to further compress saved information?

sincerely, Sam.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.