my point is ..should i have one message something like Message Record{ required HeaderMessage header; optional TradeMessage trade; repeated QuoteMessage quotes; // 0 or more repeated CustomMessage customs; // 0 or more }
or rather should i keep my file plain as object type, object, objecttype, object without worrying about the concept of a record. Each message in file is usually header + any 1 type of message (trade, quote or custom) .. and mostly only 1 quote or custom message not more. what would be faster to decode? Regards, Alok On Jan 11, 12:41 pm, alok <alok.jad...@gmail.com> wrote: > Hi everyone, > > My program is taking more time to read binary files than the text > files. I think the issue is with the structure of the binary files > that i have designed. (Or could it be possible that binary decoding is > slower than text files parsing? ). > > Data file is a large text file with 1 record per row. upto 1.2 GB. > Binary file is around 900 MB. > > ** > - Text file reading takes 3 minutes to read the file. > - Binary file reading takes 5 minutes. > > I saw a very strange behavior. > - Just to see how long it takes to skim through binary file, i > started reading header on each message which holds the length of the > message and then skipped that many bytes using the Skip() function of > coded_input object. After making this change, i was expecting that > reading through file should take less time, but it took more than 10 > minutes. Is skipping not same as adding n bytes to the file pointer? > is it slower to skip the object than read it? > > Are their any guidelines on how the structure should be designed to > get the best performance? > > My current structure looks as below > > message HeaderMessage { > required double timestamp = 1; > required string ric_code = 2; > required int32 count = 3; > required int32 total_message_size = 4; > > } > > message QuoteMessage { > enum Side { > ASK = 0; > BID = 1; > } > required Side type = 1; > required int32 level = 2; > optional double price = 3; > optional int64 size = 4; > optional int32 count = 5; > optional HeaderMessage header = 6; > > } > > message CustomMessage { > required string field_name = 1; > required double value = 2; > optional HeaderMessage header = 3; > > } > > message TradeMessage { > optional double price = 1; > optional int64 size = 2; > optional int64 AccumulatedVolume = 3; > optional HeaderMessage header = 4; > > } > > Binary file format is > object type, object, object type object ... > > 1st object of a record holds header with n number of objects in that > record. next n-1 objects will not hold header since they all belong to > same record (same update time). > now n+1th object belongs to the new record and it will hold header for > next record. > > Any advices? > > Regards, > Alok -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.