[protobuf] Re: suggestions on improving the performance?

alok Mon, 16 Jan 2012 22:02:27 -0800

anymore suggestions?

On Jan 16, 11:14 am, alok <alok.jad...@gmail.com> wrote:
> google groups 
> linkhttp://groups.google.com/group/protobuf/browse_thread/thread/64a07911...
>
> I tested the code with reusing the coded input object. Not much change
> in the speed performance.
>
> void ReadAllMessages(ZeroCopyInputStream *raw_input,
> stdext::hash_set<std::string> instruments)
> {
>         int item_count = 0;
>
>         CodedInputStream* in = new  CodedInputStream(raw_input);
>         in->SetTotalBytesLimit(1e9, 9e8);
>         while(1)
>         {
>                 if(item_count % 200000 == 0){
>                         delete in;
>                         in = new  CodedInputStream(raw_input);
>                         in->SetTotalBytesLimit(1e9, 9e8);
>                 }
>                 if(!ReadNextRecord(in, instruments))
>                         break;
>                 item_count++;
>         }
>         cout << "Finished reading file. Total "<<item_count<<" items
> read."<<endl;
>
> }
>
> I reuse coded input object for every 200k objects. there are total of
> around 650k objects in the file.
>
> I get a feeling, whether this slowness is because of my binary file
> format. is there anything i can change so that i can read it faster.
> like eg, removing optional fields and keeping the format as raw as
> possible etc.
>
> regards,
> Alok
>
> On Jan 16, 10:40 am, alok <alok.jad...@gmail.com> wrote:
>
>
>
>
>
>
>
> > here is the link to a forum which states why i have to set the limit.
>
> >http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3w...
>
> > excerpt from the link
>
> > "The problem is that CodedInputStream has internal counter of how many
> > bytes are read so far with the same object.
>
> > In my case, there are a lot of small messages saved in the same file.
> > I do not read them at once and therefore do not care about large
> > messages, limits. I am safe.
>
> > So, the problem can be easily solved by calling:
>
> > CodedInputStream input_stream(...);
> > input_stream.SetTotalBytesLimit(1e9, 9e8);
>
> > My use-case is really about storing extremely large number (up to 1e9)
> > of small messages ~ 10K each. "
>
> > My problem is same as above, so i will have to set the limits on coded
> > input object.
>
> > Regards,
> > Alok
>
> > On Jan 16, 10:26 am, alok <alok.jad...@gmail.com> wrote:
>
> > > I was actually doing that initially, but I kept getting error on
> > > "Maximum length for a message is reached" ( I dont have exact error
> > > string at the moment). This was because my input binary file is large
> > > and it reaches the limit for coded input very fast.
>
> > > I saw a post on the forum (or maybe on Stack Exchange) which suggested
> > > that i should create a new coded_input object for each message. I have
> > > to reset the limits for coded input object. user on that thread
> > > suggested that its easy to create and destroy coded_input object.
> > > These objects are not big.
>
> > > Anyways, I will try it again by resetting the limits on this object.
> > > But then, would this be casuing the slowness? I will try and let you
> > > know the results.
>
> > > Regards,
> > > Alok
>
> > > On Jan 16, 9:46 am, Daniel Wright <dwri...@google.com> wrote:
>
> > > > You're making a new CodedInputStream for each message -- I think that 
> > > > gives
> > > > very poor buffering behavior.  You should just pass coded_input to
> > > > ReadAllMessages and keep reusing it.
>
> > > > Cheers
> > > > Daniel
>
> > > > On Sun, Jan 15, 2012 at 4:41 PM, alok <alok.jad...@gmail.com> wrote:
> > > > > Daniel,
>
> > > > > i am hoping that my code is incorrect but i am not sure what is wrong
> > > > > or what is really causing this slowness.
>
> > > > > @ Henner Zeller, sorry i forgot to include the object length in above
> > > > > example. I do store object length for each object. I dont have issues
> > > > > in reading all the objects. Code is working fine. I just want to make
> > > > > sure to be able to make the code run faster now.
>
> > > > > attaching my code here...
>
> > > > > File format is
>
> > > > > File header
> > > > > Record1, Record2, Record3
>
> > > > > Each record contains n objects of type defined in proto file. 1st
> > > > > object has header which contains the number of objects in each record.
>
> > > > > <code>
> > > > > proto file
>
> > > > > message HeaderMessage {
> > > > >        required double timestamp = 1;
> > > > >  required string ric_code = 2;
> > > > >  required int32 count = 3;
> > > > >  required int32 total_message_size = 4;
> > > > > }
>
> > > > > message QuoteMessage {
> > > > >        enum Side {
> > > > >    ASK = 0;
> > > > >    BID = 1;
> > > > >  }
> > > > >  required Side type = 1;
> > > > >        required int32 level = 2;
> > > > >        optional double price = 3;
> > > > >        optional int64 size = 4;
> > > > >        optional int32 count = 5;
> > > > >        optional HeaderMessage header = 6;
> > > > > }
>
> > > > > message CustomMessage {
> > > > >        required string field_name = 1;
> > > > >        required double value = 2;
> > > > >        optional HeaderMessage header = 3;
> > > > > }
>
> > > > > message TradeMessage {
> > > > >        optional double price = 1;
> > > > >        optional int64 size = 2;
> > > > >        optional int64 AccumulatedVolume = 3;
> > > > >        optional HeaderMessage header = 4;
> > > > > }
>
> > > > > message AlphaMessage {
> > > > >        required int32 level = 1;
> > > > >        required double alpha = 2;
> > > > >        optional double stddev = 3;
> > > > >         optional HeaderMessage header = 4;
> > > > > }
>
> > > > > </code>
>
> > > > > <code>
> > > > > Reading records from binary file
>
> > > > > bool ReadNextRecord(CodedInputStream *coded_input,
> > > > > stdext::hash_set<std::string> instruments)
> > > > > {
> > > > >        uint32 count, objtype, objlen;
> > > > >        int i;
>
> > > > >        int objectsread = 0;
> > > > >        HeaderMessage *hMsg = NULL;
> > > > >        TradeMessage tMsg;
> > > > >        QuoteMessage qMsg;
> > > > >        CustomMessage cMsg;
> > > > >        AlphaMessage aMsg;
>
> > > > >        while(1)
> > > > >        {
> > > > >                if(!coded_input->ReadLittleEndian32(&objtype)) {
> > > > >                        return false;
> > > > >                }
> > > > >                if(!coded_input->ReadLittleEndian32(&objlen)) {
> > > > >                        return false;
> > > > >                }
> > > > >                CodedInputStream::Limit lim =
> > > > > coded_input->PushLimit(objlen);
>
> > > > >                switch(objtype)
> > > > >                {
> > > > >                case 2:
> > > > >                        qMsg.ParseFromCodedStream(coded_input);
> > > > >                        if(qMsg.has_header())
> > > > >                        {
> > > > >                                //hMsg =
> > > > >                                hMsg = new HeaderMessage();
> > > > >                                hMsg->Clear();
> > > > >                                hMsg->Swap(qMsg.mutable_header());
> > > > >                        }
> > > > >                        objectsread++;
> > > > >                        break;
>
> > > > >                case 3:
> > > > >                        tMsg.ParseFromCodedStream(coded_input);
> > > > >                        if(tMsg.has_header())
> > > > >                        {
> > > > >                                //hMsg = tMsg.mutable_header();
> > > > >                                hMsg = new HeaderMessage();
> > > > >                                hMsg->Clear();
> > > > >                                hMsg->Swap(tMsg.mutable_header());
> > > > >                        }
> > > > >                        objectsread++;
> > > > >                        break;
>
> > > > >                case 4:
> > > > >                        aMsg.ParseFromCodedStream(coded_input);
> > > > >                        if(aMsg.has_header())
> > > > >                        {
> > > > >                                //hMsg = aMsg.mutable_header();
> > > > >                                hMsg = new HeaderMessage();
> > > > >                                hMsg->Clear();
> > > > >                                hMsg->Swap(aMsg.mutable_header());
> > > > >                        }
> > > > >                        objectsread++;
> > > > >                        break;
>
> > > > >                case 5:
> > > > >                        cMsg.ParseFromCodedStream(coded_input);
> > > > >                        if(cMsg.has_header())
> > > > >                        {
> > > > >                                //hMsg = cMsg.mutable_header();
> > > > >                                hMsg = new HeaderMessage();
> > > > >                                hMsg->Clear();
> > > > >                                hMsg->Swap(cMsg.mutable_header());
> > > > >                        }
> > > > >                        objectsread++;
> > > > >                        break;
>
> > > > >                default:
> > > > >                        cout << "Invalid object type "<< objtype <<
> > > > > endl;
> > > > >                        return false;
> > > > >                        break;
> > > > >                }
> > > > >                coded_input->PopLimit(lim);
> > > > >                if(objectsread == hMsg->count()) break;
> > > > >        }
> > > > >        return true;
> > > > > }
>
> > > > > void ReadAllMessages(ZeroCopyInputStream *raw_input,
> > > > > stdext::hash_set<std::string> instruments)
> > > > > {
> > > > >
>
> ...
>
> read more »


-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Re: suggestions on improving the performance?

Reply via email to