[protobuf] Re: suggestions on improving the performance?

alok Tue, 10 Jan 2012 21:16:35 -0800

my point is ..should i have one message something like

Message Record{
  required HeaderMessage header;
  optional TradeMessage trade;
  repeated QuoteMessage quotes; // 0 or more
  repeated CustomMessage customs; // 0 or more
}


or rather should i keep my file plain as
object type, object, objecttype, object
without worrying about the concept of a record.

Each message in file is usually header + any 1 type of message (trade,
quote or custom) ..  and mostly only 1 quote or custom message not
more.

what would be faster to decode?

Regards,
Alok


On Jan 11, 12:41 pm, alok <alok.jad...@gmail.com> wrote:
> Hi everyone,
>
> My program is taking more time to read binary files than the text
> files. I think the issue is with the structure of the binary files
> that i have designed. (Or could it be possible that binary decoding is
> slower than text files parsing? ).
>
> Data file is a large text file with 1 record per row. upto 1.2 GB.
> Binary file is around 900 MB.
>
> **
>  - Text file reading takes 3 minutes to read the file.
>  - Binary file reading takes 5 minutes.
>
> I saw a very strange behavior.
>  - Just to see how long it takes to skim through binary file, i
> started reading header on each message which holds the length of the
> message and then skipped that many bytes using the Skip() function of
> coded_input object. After making this change, i was expecting that
> reading through file should take less time, but it took more than 10
> minutes. Is skipping not same as adding n bytes to the file pointer?
> is it slower to skip the object than read it?
>
> Are their any guidelines on how the structure should be designed to
> get the best performance?
>
> My current structure looks as below
>
> message HeaderMessage {
>   required double timestamp = 1;
>   required string ric_code = 2;
>   required int32 count = 3;
>   required int32 total_message_size = 4;
>
> }
>
> message QuoteMessage {
>         enum Side {
>     ASK = 0;
>     BID = 1;
>   }
>   required Side type = 1;
>         required int32 level = 2;
>         optional double price = 3;
>         optional int64 size = 4;
>         optional int32 count = 5;
>         optional HeaderMessage header = 6;
>
> }
>
> message CustomMessage {
>         required string field_name = 1;
>         required double value = 2;
>         optional HeaderMessage header = 3;
>
> }
>
> message TradeMessage {
>         optional double price = 1;
>         optional int64 size = 2;
>         optional int64 AccumulatedVolume = 3;
>         optional HeaderMessage header = 4;
>
> }
>
> Binary file format is
> object type, object, object type object ...
>
> 1st object of a record holds header with n number of objects in that
> record. next n-1 objects will not hold header since they all belong to
> same record (same update time).
> now n+1th object belongs to the new record and it will hold header for
> next record.
>
> Any advices?
>
> Regards,
> Alok

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Re: suggestions on improving the performance?

Reply via email to