[protobuf] Re: suggestions on improving the performance?

alok Sun, 15 Jan 2012 18:41:04 -0800

here is the link to a forum which states why i have to set the limit.

http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3wj2htwajjof+state:results


excerpt from the link

"The problem is that CodedInputStream has internal counter of how many
bytes are read so far with the same object.

In my case, there are a lot of small messages saved in the same file.
I do not read them at once and therefore do not care about large
messages, limits. I am safe.

So, the problem can be easily solved by calling:

CodedInputStream input_stream(...);
input_stream.SetTotalBytesLimit(1e9, 9e8);

My use-case is really about storing extremely large number (up to 1e9)
of small messages ~ 10K each. "


My problem is same as above, so i will have to set the limits on coded
input object.

Regards,
Alok


On Jan 16, 10:26 am, alok <alok.jad...@gmail.com> wrote:
> I was actually doing that initially, but I kept getting error on
> "Maximum length for a message is reached" ( I dont have exact error
> string at the moment). This was because my input binary file is large
> and it reaches the limit for coded input very fast.
>
> I saw a post on the forum (or maybe on Stack Exchange) which suggested
> that i should create a new coded_input object for each message. I have
> to reset the limits for coded input object. user on that thread
> suggested that its easy to create and destroy coded_input object.
> These objects are not big.
>
> Anyways, I will try it again by resetting the limits on this object.
> But then, would this be casuing the slowness? I will try and let you
> know the results.
>
> Regards,
> Alok
>
> On Jan 16, 9:46 am, Daniel Wright <dwri...@google.com> wrote:
>
>
>
>
>
>
>
> > You're making a new CodedInputStream for each message -- I think that gives
> > very poor buffering behavior.  You should just pass coded_input to
> > ReadAllMessages and keep reusing it.
>
> > Cheers
> > Daniel
>
> > On Sun, Jan 15, 2012 at 4:41 PM, alok <alok.jad...@gmail.com> wrote:
> > > Daniel,
>
> > > i am hoping that my code is incorrect but i am not sure what is wrong
> > > or what is really causing this slowness.
>
> > > @ Henner Zeller, sorry i forgot to include the object length in above
> > > example. I do store object length for each object. I dont have issues
> > > in reading all the objects. Code is working fine. I just want to make
> > > sure to be able to make the code run faster now.
>
> > > attaching my code here...
>
> > > File format is
>
> > > File header
> > > Record1, Record2, Record3
>
> > > Each record contains n objects of type defined in proto file. 1st
> > > object has header which contains the number of objects in each record.
>
> > > <code>
> > > proto file
>
> > > message HeaderMessage {
> > >        required double timestamp = 1;
> > >  required string ric_code = 2;
> > >  required int32 count = 3;
> > >  required int32 total_message_size = 4;
> > > }
>
> > > message QuoteMessage {
> > >        enum Side {
> > >    ASK = 0;
> > >    BID = 1;
> > >  }
> > >  required Side type = 1;
> > >        required int32 level = 2;
> > >        optional double price = 3;
> > >        optional int64 size = 4;
> > >        optional int32 count = 5;
> > >        optional HeaderMessage header = 6;
> > > }
>
> > > message CustomMessage {
> > >        required string field_name = 1;
> > >        required double value = 2;
> > >        optional HeaderMessage header = 3;
> > > }
>
> > > message TradeMessage {
> > >        optional double price = 1;
> > >        optional int64 size = 2;
> > >        optional int64 AccumulatedVolume = 3;
> > >        optional HeaderMessage header = 4;
> > > }
>
> > > message AlphaMessage {
> > >        required int32 level = 1;
> > >        required double alpha = 2;
> > >        optional double stddev = 3;
> > >         optional HeaderMessage header = 4;
> > > }
>
> > > </code>
>
> > > <code>
> > > Reading records from binary file
>
> > > bool ReadNextRecord(CodedInputStream *coded_input,
> > > stdext::hash_set<std::string> instruments)
> > > {
> > >        uint32 count, objtype, objlen;
> > >        int i;
>
> > >        int objectsread = 0;
> > >        HeaderMessage *hMsg = NULL;
> > >        TradeMessage tMsg;
> > >        QuoteMessage qMsg;
> > >        CustomMessage cMsg;
> > >        AlphaMessage aMsg;
>
> > >        while(1)
> > >        {
> > >                if(!coded_input->ReadLittleEndian32(&objtype)) {
> > >                        return false;
> > >                }
> > >                if(!coded_input->ReadLittleEndian32(&objlen)) {
> > >                        return false;
> > >                }
> > >                CodedInputStream::Limit lim =
> > > coded_input->PushLimit(objlen);
>
> > >                switch(objtype)
> > >                {
> > >                case 2:
> > >                        qMsg.ParseFromCodedStream(coded_input);
> > >                        if(qMsg.has_header())
> > >                        {
> > >                                //hMsg =
> > >                                hMsg = new HeaderMessage();
> > >                                hMsg->Clear();
> > >                                hMsg->Swap(qMsg.mutable_header());
> > >                        }
> > >                        objectsread++;
> > >                        break;
>
> > >                case 3:
> > >                        tMsg.ParseFromCodedStream(coded_input);
> > >                        if(tMsg.has_header())
> > >                        {
> > >                                //hMsg = tMsg.mutable_header();
> > >                                hMsg = new HeaderMessage();
> > >                                hMsg->Clear();
> > >                                hMsg->Swap(tMsg.mutable_header());
> > >                        }
> > >                        objectsread++;
> > >                        break;
>
> > >                case 4:
> > >                        aMsg.ParseFromCodedStream(coded_input);
> > >                        if(aMsg.has_header())
> > >                        {
> > >                                //hMsg = aMsg.mutable_header();
> > >                                hMsg = new HeaderMessage();
> > >                                hMsg->Clear();
> > >                                hMsg->Swap(aMsg.mutable_header());
> > >                        }
> > >                        objectsread++;
> > >                        break;
>
> > >                case 5:
> > >                        cMsg.ParseFromCodedStream(coded_input);
> > >                        if(cMsg.has_header())
> > >                        {
> > >                                //hMsg = cMsg.mutable_header();
> > >                                hMsg = new HeaderMessage();
> > >                                hMsg->Clear();
> > >                                hMsg->Swap(cMsg.mutable_header());
> > >                        }
> > >                        objectsread++;
> > >                        break;
>
> > >                default:
> > >                        cout << "Invalid object type "<< objtype <<
> > > endl;
> > >                        return false;
> > >                        break;
> > >                }
> > >                coded_input->PopLimit(lim);
> > >                if(objectsread == hMsg->count()) break;
> > >        }
> > >        return true;
> > > }
>
> > > void ReadAllMessages(ZeroCopyInputStream *raw_input,
> > > stdext::hash_set<std::string> instruments)
> > > {
> > >        int item_count = 0;
> > >        while(1)
> > >        {
> > >                CodedInputStream in(raw_input);
> > >                if(!ReadNextRecord(&in, instruments))
> > >                        break;
> > >                item_count++;
> > >        }
> > >        cout << "Finished reading file. Total "<<item_count<<" items
> > > read."<<endl;
> > > }
>
> > > int _tmain(int argc, _TCHAR* argv[])
> > > {
> > >        GOOGLE_PROTOBUF_VERIFY_VERSION;
>
> > >        ZeroCopyInputStream *raw_input;
> > >        CodedInputStream *coded_input;
> > >        stdext::hash_set<std::string> instruments;
>
> > >        string filename = "S:/users/aaj/sandbox/tickdata/bin/hk/
> > > 2011/2011.01.04.bin";
> > >        int fd = _open(filename.c_str(), _O_BINARY | O_RDONLY);
>
> > >        if( fd == -1 )
> > >        {
> > >                printf( "Error opening the file. \n" );
> > >                exit( 1 );
> > >        }
>
> > >        raw_input = new FileInputStream(fd);
> > >        coded_input = new CodedInputStream(raw_input);
>
> > >        uint32 magic_no;
>
> > >        coded_input->ReadLittleEndian32(&magic_no);
>
> > >        cout << "HEADER: " << "\t" << magic_no<<endl;
> > >        cout << "Reading data objects.." << endl;
> > >        delete coded_input;
> > >        cout << td << '\n';
>
> > >        ReadAllMessages(raw_input, instruments);
>
> > >        cout << td << '\n';
>
> > >        delete raw_input;
> > >        _close(fd);
> > >        google::protobuf::ShutdownProtobufLibrary();
>
> > >        return 0;
> > > }
>
> > > </code>
>
> > > On Jan 14, 3:37 am, Henner Zeller <henner.zel...@googlemail.com>
> > > wrote:
> > > > On Fri, Jan 13, 2012 at 11:22, Daniel Wright <dwri...@google.com> wrote:
> > > > > It's extremely unlikely that text parsing is faster than binary
> > > parsing on
> > > > > pretty much any message.  My guess is that there's something wrong in
> > > the
> > > > > way you're reading the binary file -- e.g. no buffering, or possibly a
> > > bug
> > > > > where you hand the protobuf library multiple messages concatenated
> > > together.
>
> > > > In particular, the
> > > >    object type, object, object type object ..
> > > > doesn't seem to include headers that describe the length of the
> > > > following message, but such a separator is needed.
> > > > (http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre..
> > > .)
>
> > > > >  It'd be easier to comment if you post the code.
>
> > > > > Cheers
> > > > > Daniel
>
> > > > > On Fri, Jan 13, 2012 at 1:22 AM, alok
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Re: suggestions on improving the performance?

Reply via email to