any suggestions? experiences? regards, Alok
On Jan 11, 1:16 pm, alok <alok.jad...@gmail.com> wrote: > my point is ..should i have one message something like > > Message Record{ > required HeaderMessage header; > optional TradeMessage trade; > repeated QuoteMessage quotes; // 0 or more > repeated CustomMessage customs; // 0 or more > > } > > or rather should i keep my file plain as > object type, object, objecttype, object > without worrying about the concept of a record. > > Each message in file is usually header + any 1 type of message (trade, > quote or custom) .. and mostly only 1 quote or custom message not > more. > > what would be faster to decode? > > Regards, > Alok > > On Jan 11, 12:41 pm, alok <alok.jad...@gmail.com> wrote: > > > > > > > > > Hi everyone, > > > My program is taking more time to read binary files than the text > > files. I think the issue is with the structure of the binary files > > that i have designed. (Or could it be possible that binary decoding is > > slower than text files parsing? ). > > > Data file is a large text file with 1 record per row. upto 1.2 GB. > > Binary file is around 900 MB. > > > ** > > - Text file reading takes 3 minutes to read the file. > > - Binary file reading takes 5 minutes. > > > I saw a very strange behavior. > > - Just to see how long it takes to skim through binary file, i > > started reading header on each message which holds the length of the > > message and then skipped that many bytes using the Skip() function of > > coded_input object. After making this change, i was expecting that > > reading through file should take less time, but it took more than 10 > > minutes. Is skipping not same as adding n bytes to the file pointer? > > is it slower to skip the object than read it? > > > Are their any guidelines on how the structure should be designed to > > get the best performance? > > > My current structure looks as below > > > message HeaderMessage { > > required double timestamp = 1; > > required string ric_code = 2; > > required int32 count = 3; > > required int32 total_message_size = 4; > > > } > > > message QuoteMessage { > > enum Side { > > ASK = 0; > > BID = 1; > > } > > required Side type = 1; > > required int32 level = 2; > > optional double price = 3; > > optional int64 size = 4; > > optional int32 count = 5; > > optional HeaderMessage header = 6; > > > } > > > message CustomMessage { > > required string field_name = 1; > > required double value = 2; > > optional HeaderMessage header = 3; > > > } > > > message TradeMessage { > > optional double price = 1; > > optional int64 size = 2; > > optional int64 AccumulatedVolume = 3; > > optional HeaderMessage header = 4; > > > } > > > Binary file format is > > object type, object, object type object ... > > > 1st object of a record holds header with n number of objects in that > > record. next n-1 objects will not hold header since they all belong to > > same record (same update time). > > now n+1th object belongs to the new record and it will hold header for > > next record. > > > Any advices? > > > Regards, > > Alok -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.