here is the link to a forum which states why i have to set the limit. http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3wj2htwajjof+state:results
excerpt from the link "The problem is that CodedInputStream has internal counter of how many bytes are read so far with the same object. In my case, there are a lot of small messages saved in the same file. I do not read them at once and therefore do not care about large messages, limits. I am safe. So, the problem can be easily solved by calling: CodedInputStream input_stream(...); input_stream.SetTotalBytesLimit(1e9, 9e8); My use-case is really about storing extremely large number (up to 1e9) of small messages ~ 10K each. " My problem is same as above, so i will have to set the limits on coded input object. Regards, Alok On Jan 16, 10:26 am, alok <alok.jad...@gmail.com> wrote: > I was actually doing that initially, but I kept getting error on > "Maximum length for a message is reached" ( I dont have exact error > string at the moment). This was because my input binary file is large > and it reaches the limit for coded input very fast. > > I saw a post on the forum (or maybe on Stack Exchange) which suggested > that i should create a new coded_input object for each message. I have > to reset the limits for coded input object. user on that thread > suggested that its easy to create and destroy coded_input object. > These objects are not big. > > Anyways, I will try it again by resetting the limits on this object. > But then, would this be casuing the slowness? I will try and let you > know the results. > > Regards, > Alok > > On Jan 16, 9:46 am, Daniel Wright <dwri...@google.com> wrote: > > > > > > > > > You're making a new CodedInputStream for each message -- I think that gives > > very poor buffering behavior. You should just pass coded_input to > > ReadAllMessages and keep reusing it. > > > Cheers > > Daniel > > > On Sun, Jan 15, 2012 at 4:41 PM, alok <alok.jad...@gmail.com> wrote: > > > Daniel, > > > > i am hoping that my code is incorrect but i am not sure what is wrong > > > or what is really causing this slowness. > > > > @ Henner Zeller, sorry i forgot to include the object length in above > > > example. I do store object length for each object. I dont have issues > > > in reading all the objects. Code is working fine. I just want to make > > > sure to be able to make the code run faster now. > > > > attaching my code here... > > > > File format is > > > > File header > > > Record1, Record2, Record3 > > > > Each record contains n objects of type defined in proto file. 1st > > > object has header which contains the number of objects in each record. > > > > <code> > > > proto file > > > > message HeaderMessage { > > > required double timestamp = 1; > > > required string ric_code = 2; > > > required int32 count = 3; > > > required int32 total_message_size = 4; > > > } > > > > message QuoteMessage { > > > enum Side { > > > ASK = 0; > > > BID = 1; > > > } > > > required Side type = 1; > > > required int32 level = 2; > > > optional double price = 3; > > > optional int64 size = 4; > > > optional int32 count = 5; > > > optional HeaderMessage header = 6; > > > } > > > > message CustomMessage { > > > required string field_name = 1; > > > required double value = 2; > > > optional HeaderMessage header = 3; > > > } > > > > message TradeMessage { > > > optional double price = 1; > > > optional int64 size = 2; > > > optional int64 AccumulatedVolume = 3; > > > optional HeaderMessage header = 4; > > > } > > > > message AlphaMessage { > > > required int32 level = 1; > > > required double alpha = 2; > > > optional double stddev = 3; > > > optional HeaderMessage header = 4; > > > } > > > > </code> > > > > <code> > > > Reading records from binary file > > > > bool ReadNextRecord(CodedInputStream *coded_input, > > > stdext::hash_set<std::string> instruments) > > > { > > > uint32 count, objtype, objlen; > > > int i; > > > > int objectsread = 0; > > > HeaderMessage *hMsg = NULL; > > > TradeMessage tMsg; > > > QuoteMessage qMsg; > > > CustomMessage cMsg; > > > AlphaMessage aMsg; > > > > while(1) > > > { > > > if(!coded_input->ReadLittleEndian32(&objtype)) { > > > return false; > > > } > > > if(!coded_input->ReadLittleEndian32(&objlen)) { > > > return false; > > > } > > > CodedInputStream::Limit lim = > > > coded_input->PushLimit(objlen); > > > > switch(objtype) > > > { > > > case 2: > > > qMsg.ParseFromCodedStream(coded_input); > > > if(qMsg.has_header()) > > > { > > > //hMsg = > > > hMsg = new HeaderMessage(); > > > hMsg->Clear(); > > > hMsg->Swap(qMsg.mutable_header()); > > > } > > > objectsread++; > > > break; > > > > case 3: > > > tMsg.ParseFromCodedStream(coded_input); > > > if(tMsg.has_header()) > > > { > > > //hMsg = tMsg.mutable_header(); > > > hMsg = new HeaderMessage(); > > > hMsg->Clear(); > > > hMsg->Swap(tMsg.mutable_header()); > > > } > > > objectsread++; > > > break; > > > > case 4: > > > aMsg.ParseFromCodedStream(coded_input); > > > if(aMsg.has_header()) > > > { > > > //hMsg = aMsg.mutable_header(); > > > hMsg = new HeaderMessage(); > > > hMsg->Clear(); > > > hMsg->Swap(aMsg.mutable_header()); > > > } > > > objectsread++; > > > break; > > > > case 5: > > > cMsg.ParseFromCodedStream(coded_input); > > > if(cMsg.has_header()) > > > { > > > //hMsg = cMsg.mutable_header(); > > > hMsg = new HeaderMessage(); > > > hMsg->Clear(); > > > hMsg->Swap(cMsg.mutable_header()); > > > } > > > objectsread++; > > > break; > > > > default: > > > cout << "Invalid object type "<< objtype << > > > endl; > > > return false; > > > break; > > > } > > > coded_input->PopLimit(lim); > > > if(objectsread == hMsg->count()) break; > > > } > > > return true; > > > } > > > > void ReadAllMessages(ZeroCopyInputStream *raw_input, > > > stdext::hash_set<std::string> instruments) > > > { > > > int item_count = 0; > > > while(1) > > > { > > > CodedInputStream in(raw_input); > > > if(!ReadNextRecord(&in, instruments)) > > > break; > > > item_count++; > > > } > > > cout << "Finished reading file. Total "<<item_count<<" items > > > read."<<endl; > > > } > > > > int _tmain(int argc, _TCHAR* argv[]) > > > { > > > GOOGLE_PROTOBUF_VERIFY_VERSION; > > > > ZeroCopyInputStream *raw_input; > > > CodedInputStream *coded_input; > > > stdext::hash_set<std::string> instruments; > > > > string filename = "S:/users/aaj/sandbox/tickdata/bin/hk/ > > > 2011/2011.01.04.bin"; > > > int fd = _open(filename.c_str(), _O_BINARY | O_RDONLY); > > > > if( fd == -1 ) > > > { > > > printf( "Error opening the file. \n" ); > > > exit( 1 ); > > > } > > > > raw_input = new FileInputStream(fd); > > > coded_input = new CodedInputStream(raw_input); > > > > uint32 magic_no; > > > > coded_input->ReadLittleEndian32(&magic_no); > > > > cout << "HEADER: " << "\t" << magic_no<<endl; > > > cout << "Reading data objects.." << endl; > > > delete coded_input; > > > cout << td << '\n'; > > > > ReadAllMessages(raw_input, instruments); > > > > cout << td << '\n'; > > > > delete raw_input; > > > _close(fd); > > > google::protobuf::ShutdownProtobufLibrary(); > > > > return 0; > > > } > > > > </code> > > > > On Jan 14, 3:37 am, Henner Zeller <henner.zel...@googlemail.com> > > > wrote: > > > > On Fri, Jan 13, 2012 at 11:22, Daniel Wright <dwri...@google.com> wrote: > > > > > It's extremely unlikely that text parsing is faster than binary > > > parsing on > > > > > pretty much any message. My guess is that there's something wrong in > > > the > > > > > way you're reading the binary file -- e.g. no buffering, or possibly a > > > bug > > > > > where you hand the protobuf library multiple messages concatenated > > > together. > > > > > In particular, the > > > > object type, object, object type object .. > > > > doesn't seem to include headers that describe the length of the > > > > following message, but such a separator is needed. > > > > (http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre.. > > > .) > > > > > > It'd be easier to comment if you post the code. > > > > > > Cheers > > > > > Daniel > > > > > > On Fri, Jan 13, 2012 at 1:22 AM, alok > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.