anymore suggestions? On Jan 16, 11:14 am, alok <alok.jad...@gmail.com> wrote: > google groups > linkhttp://groups.google.com/group/protobuf/browse_thread/thread/64a07911... > > I tested the code with reusing the coded input object. Not much change > in the speed performance. > > void ReadAllMessages(ZeroCopyInputStream *raw_input, > stdext::hash_set<std::string> instruments) > { > int item_count = 0; > > CodedInputStream* in = new CodedInputStream(raw_input); > in->SetTotalBytesLimit(1e9, 9e8); > while(1) > { > if(item_count % 200000 == 0){ > delete in; > in = new CodedInputStream(raw_input); > in->SetTotalBytesLimit(1e9, 9e8); > } > if(!ReadNextRecord(in, instruments)) > break; > item_count++; > } > cout << "Finished reading file. Total "<<item_count<<" items > read."<<endl; > > } > > I reuse coded input object for every 200k objects. there are total of > around 650k objects in the file. > > I get a feeling, whether this slowness is because of my binary file > format. is there anything i can change so that i can read it faster. > like eg, removing optional fields and keeping the format as raw as > possible etc. > > regards, > Alok > > On Jan 16, 10:40 am, alok <alok.jad...@gmail.com> wrote: > > > > > > > > > here is the link to a forum which states why i have to set the limit. > > >http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3w... > > > excerpt from the link > > > "The problem is that CodedInputStream has internal counter of how many > > bytes are read so far with the same object. > > > In my case, there are a lot of small messages saved in the same file. > > I do not read them at once and therefore do not care about large > > messages, limits. I am safe. > > > So, the problem can be easily solved by calling: > > > CodedInputStream input_stream(...); > > input_stream.SetTotalBytesLimit(1e9, 9e8); > > > My use-case is really about storing extremely large number (up to 1e9) > > of small messages ~ 10K each. " > > > My problem is same as above, so i will have to set the limits on coded > > input object. > > > Regards, > > Alok > > > On Jan 16, 10:26 am, alok <alok.jad...@gmail.com> wrote: > > > > I was actually doing that initially, but I kept getting error on > > > "Maximum length for a message is reached" ( I dont have exact error > > > string at the moment). This was because my input binary file is large > > > and it reaches the limit for coded input very fast. > > > > I saw a post on the forum (or maybe on Stack Exchange) which suggested > > > that i should create a new coded_input object for each message. I have > > > to reset the limits for coded input object. user on that thread > > > suggested that its easy to create and destroy coded_input object. > > > These objects are not big. > > > > Anyways, I will try it again by resetting the limits on this object. > > > But then, would this be casuing the slowness? I will try and let you > > > know the results. > > > > Regards, > > > Alok > > > > On Jan 16, 9:46 am, Daniel Wright <dwri...@google.com> wrote: > > > > > You're making a new CodedInputStream for each message -- I think that > > > > gives > > > > very poor buffering behavior. You should just pass coded_input to > > > > ReadAllMessages and keep reusing it. > > > > > Cheers > > > > Daniel > > > > > On Sun, Jan 15, 2012 at 4:41 PM, alok <alok.jad...@gmail.com> wrote: > > > > > Daniel, > > > > > > i am hoping that my code is incorrect but i am not sure what is wrong > > > > > or what is really causing this slowness. > > > > > > @ Henner Zeller, sorry i forgot to include the object length in above > > > > > example. I do store object length for each object. I dont have issues > > > > > in reading all the objects. Code is working fine. I just want to make > > > > > sure to be able to make the code run faster now. > > > > > > attaching my code here... > > > > > > File format is > > > > > > File header > > > > > Record1, Record2, Record3 > > > > > > Each record contains n objects of type defined in proto file. 1st > > > > > object has header which contains the number of objects in each record. > > > > > > <code> > > > > > proto file > > > > > > message HeaderMessage { > > > > > required double timestamp = 1; > > > > > required string ric_code = 2; > > > > > required int32 count = 3; > > > > > required int32 total_message_size = 4; > > > > > } > > > > > > message QuoteMessage { > > > > > enum Side { > > > > > ASK = 0; > > > > > BID = 1; > > > > > } > > > > > required Side type = 1; > > > > > required int32 level = 2; > > > > > optional double price = 3; > > > > > optional int64 size = 4; > > > > > optional int32 count = 5; > > > > > optional HeaderMessage header = 6; > > > > > } > > > > > > message CustomMessage { > > > > > required string field_name = 1; > > > > > required double value = 2; > > > > > optional HeaderMessage header = 3; > > > > > } > > > > > > message TradeMessage { > > > > > optional double price = 1; > > > > > optional int64 size = 2; > > > > > optional int64 AccumulatedVolume = 3; > > > > > optional HeaderMessage header = 4; > > > > > } > > > > > > message AlphaMessage { > > > > > required int32 level = 1; > > > > > required double alpha = 2; > > > > > optional double stddev = 3; > > > > > optional HeaderMessage header = 4; > > > > > } > > > > > > </code> > > > > > > <code> > > > > > Reading records from binary file > > > > > > bool ReadNextRecord(CodedInputStream *coded_input, > > > > > stdext::hash_set<std::string> instruments) > > > > > { > > > > > uint32 count, objtype, objlen; > > > > > int i; > > > > > > int objectsread = 0; > > > > > HeaderMessage *hMsg = NULL; > > > > > TradeMessage tMsg; > > > > > QuoteMessage qMsg; > > > > > CustomMessage cMsg; > > > > > AlphaMessage aMsg; > > > > > > while(1) > > > > > { > > > > > if(!coded_input->ReadLittleEndian32(&objtype)) { > > > > > return false; > > > > > } > > > > > if(!coded_input->ReadLittleEndian32(&objlen)) { > > > > > return false; > > > > > } > > > > > CodedInputStream::Limit lim = > > > > > coded_input->PushLimit(objlen); > > > > > > switch(objtype) > > > > > { > > > > > case 2: > > > > > qMsg.ParseFromCodedStream(coded_input); > > > > > if(qMsg.has_header()) > > > > > { > > > > > //hMsg = > > > > > hMsg = new HeaderMessage(); > > > > > hMsg->Clear(); > > > > > hMsg->Swap(qMsg.mutable_header()); > > > > > } > > > > > objectsread++; > > > > > break; > > > > > > case 3: > > > > > tMsg.ParseFromCodedStream(coded_input); > > > > > if(tMsg.has_header()) > > > > > { > > > > > //hMsg = tMsg.mutable_header(); > > > > > hMsg = new HeaderMessage(); > > > > > hMsg->Clear(); > > > > > hMsg->Swap(tMsg.mutable_header()); > > > > > } > > > > > objectsread++; > > > > > break; > > > > > > case 4: > > > > > aMsg.ParseFromCodedStream(coded_input); > > > > > if(aMsg.has_header()) > > > > > { > > > > > //hMsg = aMsg.mutable_header(); > > > > > hMsg = new HeaderMessage(); > > > > > hMsg->Clear(); > > > > > hMsg->Swap(aMsg.mutable_header()); > > > > > } > > > > > objectsread++; > > > > > break; > > > > > > case 5: > > > > > cMsg.ParseFromCodedStream(coded_input); > > > > > if(cMsg.has_header()) > > > > > { > > > > > //hMsg = cMsg.mutable_header(); > > > > > hMsg = new HeaderMessage(); > > > > > hMsg->Clear(); > > > > > hMsg->Swap(cMsg.mutable_header()); > > > > > } > > > > > objectsread++; > > > > > break; > > > > > > default: > > > > > cout << "Invalid object type "<< objtype << > > > > > endl; > > > > > return false; > > > > > break; > > > > > } > > > > > coded_input->PopLimit(lim); > > > > > if(objectsread == hMsg->count()) break; > > > > > } > > > > > return true; > > > > > } > > > > > > void ReadAllMessages(ZeroCopyInputStream *raw_input, > > > > > stdext::hash_set<std::string> instruments) > > > > > { > > > > > > > ... > > read more »
-- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.