[protobuf] Re: Issue 248 in protobuf: protobuf will not compile without thread library

2010-12-07 Thread protobuf

Updates:
Status: Accepted
Owner: ken...@google.com

Comment #2 on issue 248 by ken...@google.com: protobuf will not compile  
without thread library

http://code.google.com/p/protobuf/issues/detail?id=248

Hmm.  I don't think we should automatically fall back to thread-hostile  
code when no threading library is available -- this could cause really  
hard-to-debug problems if it happened by accident.  But we could certainly  
provide a way for the user to explicitly ask for this, e.g. a  
--without-thread-safety configure option.


--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] 2.4.0 and lazy UTF-8 conversions in Java

2010-12-07 Thread Kenton Varda
On Wed, Dec 1, 2010 at 3:33 AM, Evan Jones  wrote:

> The instanceof approach to switch between the two is a good idea. When I
> wrote my implementation, I was concerned about the thread-safeness issues,
> although I don't think I ever considered this particular version. However, I
> think this can be made thread-safe, even without volatile (although I only
> understand the JMM enough to be dangerous).
>

Well, Jeremy Manson literally wrote the book on the Java memory model, and
he says it works.  :)

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Any Python wrappers for the C++ implementation?

2010-12-07 Thread Kenton Varda
Cool.  Serialization and parsing themselves should actually be improved even
more than that, but having other Python code around it waters down the
numbers.  :)  Also, note that if you explicitly compile C++ versions of your
messages and link them into the process, they'll be even faster.  (If you
don't, the library falls back to DynamicMessage which is not as fast as
generated code.)

As for when 2.4.0 might be released, it's hard to say.  There's a lot of
work to do, and we have a new person doing this release so he has to learn
the process.  Also, holidays are coming up.  So, I'd guess it will be ready
sometime in January.

On Wed, Dec 1, 2010 at 1:54 PM, Yang Zhang  wrote:

> FWIW I'm seeing ~12x and ~7x speed-ups on serialization and parsing,
> respectively, for messages in our app (which are ~10KB serialized) -
> not too shabby!
>
> $ python sandbox/pbbench.py out.ini # time in seconds per msg serialization
> ser: 0.000434461673101
> parse: 0.000602062404156
>
> $ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python sandbox/pbbench.py
> out.ini
> ser: 2.86788344383e-05
> parse: 7.63910810153e-05
>
> Yang
>
>
> On Wed, Dec 1, 2010 at 12:07 PM, Yang Zhang 
> wrote:
> > Thanks Kenton, we'll take a look. Out of curiosity, any ETA on 2.4.0?
> >
> > On Wed, Dec 1, 2010 at 12:04 PM, Kenton Varda  wrote:
> >> Protobuf 2.4.0 will include an implementation of the Python API that is
> >> backed by C++ objects.  The interface is identical to the existing
> Python
> >> API, and you can wrap it around existing C++ objects or have it
> construct
> >> its own.
> >> This code is already is SVN.  Unfortunately the team is someone
> backlogged
> >> and we haven't been able to make a lot of progress on an official
> release.
> >>  But it should be a lot easier to get the SVN code working than to write
> >> your own.  :)
> >> On Tue, Nov 30, 2010 at 11:22 PM, Yang Zhang 
> wrote:
> >>>
> >>> Has anyone written (a tool for generating) Python wrappers around the
> >>> C++ generated code and is willing to share this? I'm looking to do the
> >>> same, so this would save me a bit of research time. (It's fine if it's
> >>> not a general tool and this is specific to some schema.) Thanks!
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> Groups
> >>> "Protocol Buffers" group.
> >>> To post to this group, send email to proto...@googlegroups.com.
> >>> To unsubscribe from this group, send email to
> >>> protobuf+unsubscr...@googlegroups.com
> .
> >>> For more options, visit this group at
> >>> http://groups.google.com/group/protobuf?hl=en.
> >
>
>
>
> --
> Yang Zhang
> http://yz.mit.edu/
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] protobuf json codec

2010-12-07 Thread Kenton Varda
Hi Siju,

This is pretty cool.  I added it to the third-party add-ons wiki.
  http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns

Just a minor complaint:  You are not Google, so your code should not live
under the com.google package.  Can you please move it to a different
package?  Perhaps com.github.sijuv would be appropriate?

On Wed, Dec 1, 2010 at 9:11 PM, Siju  wrote:

> I have written a json codec for protobuf which uses jackson as the
> underlying parsing framework. It is twice as fast for json-protobuf
> conversion compared to the json codec at
> http://code.google.com/p/protobuf-java-format/
> and about the same for protobuf-json conversion.
>
> This is the first draft, and I plan to plugin other serialization
> schemes like xml.
>
> Appreciate feedback from the community.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Preferred Citation?

2010-12-07 Thread Kenton Varda
There's a (slightly outdated) list of contributors here:
  http://code.google.com/p/protobuf/source/browse/trunk/CONTRIBUTORS.txt

But I think it's probably best to list Google as the author.  I did not
invent protocol buffers; I just wrote version 2 and open sourced it, and I
had help.

I don't really know how citations work, though.

On Fri, Dec 3, 2010 at 7:25 PM, Dan Homerick  wrote:

> I'm writing up a master's project MS which has used protobufs extensively.
> Is there a preferred article for crediting protobufs with? If not, I could
> cite it with something like (Bibtex):
>
> @MISC{protobuf,
>   title={Protocol Buffers},
>   author={Kenton Varda},
>   howpublished={\url{http://code.google.com/apis/protocolbuffers/}},
> }
>
> but I'm sure that the authors list is woefully incomplete. Is there a more
> complete list of authors available?
>
> Alternatively, I could just put the project's URL in a footnote.
>
> - Dan
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Issue 196 in protobuf: Python: Ascii output is not assured to be in utf-8

2010-12-07 Thread protobuf

Updates:
Status: Accepted
Labels: -FixedIn-2.4.0

Comment #3 on issue 196 by ken...@google.com: Python: Ascii output is not  
assured to be in utf-8

http://code.google.com/p/protobuf/issues/detail?id=196

Jisi, I'm not convinced that this is fixed.  The as_utf parameter simply  
prevents the printer from escaping character codes >= 128.  The bug report  
seems like it may actually be a problem in the parser.  Also, round trips  
should work correctly even if as_utf is not used.  We should investigate  
further, and make sure we have test cases that print and then parse a  
message containing Unicode characters, both with and without as_utf enabled.


--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: java parse with class known at runtime (and compiled proto)

2010-12-07 Thread Kenton Varda
Evan is correct.  The best way to write code which deals with a generic
protobuf type is to have it take a default instance as a parameter.  From
that you can do everything else.

This is actually better than passing around Class objects, because it allows
users to use DynamicMessages with your code.  Using Class objects forces
users to use only generated types.  Also, Java reflection may be slow or
even unavailable on some platforms.

On Mon, Dec 6, 2010 at 7:51 AM, Evan Jones  wrote:

> On Dec 6, 2010, at 10:31 , Koert Kuipers wrote:
>
>> But that doesn't make a parseFrom() in message interface invalid, does it?
>> Indeed some other information outside the raw bytes will be needed to pick
>> to right Message subclass. But that's fine.
>>
>
> Oh, sorry, I misunderstood your question, so my answer is somewhat invalid.
>
>
>  One could then:
>> 1) pick the right subclass of Message based upon some information outside
>> the raw bytes (in my case something stored in a protobuf wrapper around the
>> raw bytes)
>> 2) call subclass.parseFrom(bytes)
>>
>> now we have to jump through more hoops for step 2 (create instance of
>> Message subclass, newBuilderForType, mergeFrom, isInitialized, build)
>>
>
> The MessageLite.Builder interface has a mergeFrom method that does what you
> want. What you should do is something like:
>
> * Get a MessageLite instance for the message type you want to parse (eg.
> something like MyMessageType.getDefaultInstance(), or
> MessageLite.getDefaultInstanceForType())
> * Hold on to that MessageLite instance in some sort of registry.
> (HashMap?)
> * When you get a message, look at the protobuf wrapper to determine the
> type.
> * Look up the "prototype" MessageLite instance in your registry.
> * Call prototypeInstance.newBuilderForType().mergeFrom(bytes).build()
>
> This only creates a single instance of the message each time. The .build()
> method will automatically check that the message is initialized, so you
> don't need to call isInitialized (although you may want to catch the
> exception it could throw?).
>
> This Builder pattern is used so that the Message objects are immutable.
> This means they can be passed between threads without requiring any
> synchronization. See:
>
> http://code.google.com/apis/protocolbuffers/docs/javatutorial.html#builders
>
> Hope this helps,
>
>
> Evan
>
> --
> Evan Jones
> http://evanjones.ca/
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Nullpointer Exception when .parseFrom(byte [])

2010-12-07 Thread Kenton Varda
The file you attached is not useful without source code.

But what would be even better is if you could provide the stack trace for
the NullPointerException.  The parser should never throw NPE and I'm not
aware of any bugs which cause it to throw NPE.

BTW, it's very easy for bytes which are not actually a protocol buffer to
parse correctly as a protocol buffer.  The protobuf encoding is pretty
dense.  So I would suggest that you instead use some sort of tagging
mechanism to distinguish protobuf data from other kinds of data that you
need to process.

On Mon, Dec 6, 2010 at 2:41 AM, jim horng  wrote:

> Hi all,
>
> It occur to me that when I'm trying to parse any of the binary data
> files I listed below,
> using  .parseFrom(byte []). it said nullpointer exception.
>
> To my best knowledge, I think when the raw bytes does not contain
> required field, protobuf API should return invalidprotobufexception
> when using .parseFrom(), or at least it will throw
> uninitializeprotobuf exception when .build().
> so that client will know what happen and have change to handle them
> correctly.
> But if it return nullpointerexception, it will be hard to know what
> really happen, it can only be roughly assumed message parsing is
> failed.
> My goal is just to determine if the received binary files is an
> expected protobuf message or not.
>
> Please kindly advise or if this is work as design under some reason.
> Thanks a lot !
>
>
>
> 
>   String filepath = "np1.pb";
>   byte[] file_byteary = getBytesFromFile(new File(filepath));
>
>   xxx.Connection.Builder conn_builder =
> xxx.Connection.newBuilder();
>   xxx.Connection MB_tmp =
> xxx.Connection.parseFrom(file_byteary);
>   conn_builder.mergeFrom(MB_tmp);
>   System.out.println("is init?" + conn_builder.isInitialized());
>   assertTrue(conn_builder.build() != null);
> 
>
> * test data
>
> https://docs.google.com/leaf?id=0B9CkCymjWlZLNWMyMGNiNjEtMDJiZi00NDY0LTgzMWUtMjI1N2I0MWVhNjlh&hl=en
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] "File already exists in database" on Mac OSX

2010-12-07 Thread Kenton Varda
I assume this is happening at startup?  Is it possible that you are somehow
linking two copies of the same .pb.cc into your binary?  There are cases
where the linker doesn't catch this, especially when dynamic linking is
involved, but sometimes even with static linking.  Speaking of which, are
you linking dynamically or statically?

I am not aware of any outstanding issues that would cause this, but C++
initialization behavior can be crazy complicated and can even vary between
platforms.

On Mon, Dec 6, 2010 at 8:05 AM, Alex Nixon  wrote:

> Hello,
>
> I'm receiving a "File already exists in database" error from the
> protocol buffers library at runtime on Mac OSX, and I'm unsure as to
> what's causing it or where to look to isolate the problem.  I have the
> exact same codebase compiling on Windows and Linux and they both work
> fine - so I wondered if this is a protocol buffers issue?  That said,
> I was unable to find any outstanding issues in the bug tracker which
> match my symptoms.
>
> All of my .proto files have different names and are in the same
> directory.  The generated .pb.h and .pb.cc files on Mac OSX are
> identical to those generated on Linux.
>
> Any advice would be much appreciated.
>
> Thanks in advance,
> - Alex Nixon
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Issue 247 in protobuf: Ability to redirect file output to stdout

2010-12-07 Thread protobuf


Comment #2 on issue 247 by ken...@google.com: Ability to redirect file  
output to stdout

http://code.google.com/p/protobuf/issues/detail?id=247

What happens if multiple files are generated?  For example, if you  
have "option java_multiple_files = true;" in your .proto file.  This seems  
like it could get weird.


--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] 3 issues changed in protobuf

2010-12-07 Thread protobuf

Updates:
Status: Fixed
Labels: FixedIn-2.4.0

Comment by liuj...@google.com:
Fixed in r358

Affected issues:
  issue 223: Python Package Doesn't Contain Compiler Plugin Module
http://code.google.com/p/protobuf/issues/detail?id=223

  issue 224: Protobuf installation error in Cygwin
http://code.google.com/p/protobuf/issues/detail?id=224

  issue 242: Can no longer install protobuf on pypy
http://code.google.com/p/protobuf/issues/detail?id=242



--
You received this message because you are listed in the owner
or CC fields of these issues, or because you starred them.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Issue 179 in protobuf: Visual C++ error C1091 when compiling protoc generated code with over 64k descriptor

2010-12-07 Thread protobuf

Updates:
Status: New

Comment #2 on issue 179 by ken...@google.com: Visual C++ error C1091 when  
compiling protoc generated code with over 64k descriptor

http://code.google.com/p/protobuf/issues/detail?id=179

This proves complicated to fix, because we have lots of code that passes  
around this blob as a pointer and a size.  We'd have to update all that  
code to pass around a pointer to a more complex data structure which might  
contain multiple segments.  Any fix would have to be used with all  
compilers, even ones which don't have this limitation.


Your example is a 10MB source file.  I don't think we care to support 10MB  
source files.  Realistically, if a file is so big that its descriptor is  
over 64k, the best solution may be to simply split up the file.  If you are  
stuffing an excessive amount of information in custom options, it may be  
time to consider a different format for that information.


I'd like to see more demand before we put effort into fixing this.

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Any Python wrappers for the C++ implementation?

2010-12-07 Thread Yang Zhang
On Tue, Dec 7, 2010 at 7:08 PM, Kenton Varda  wrote:
> Cool.  Serialization and parsing themselves should actually be improved even
> more than that, but having other Python code around it waters down the
> numbers.  :)

The times are from a minimal microbenchmark using Python's timeit module:

nruns = 1000
nwarmups = 100

es = ... # the protobufs

def ser():
  return [e.SerializeToString() for e in es]

def parse(ses):
  for se in ses: pb.Email().ParseFromString(se)

t = timeit.Timer(lambda:None)
t.timeit(nwarmups)
print 'noop:', t.timeit(nruns) / nruns

t = timeit.Timer(ser)
t.timeit(nwarmups)
print 'ser:', t.timeit(nruns) / nruns / len(es)

ses = ser()
t = timeit.Timer(lambda: parse(ses))
t.timeit(nwarmups)
print 'parse:', t.timeit(nruns) / nruns / len(es)

print 'msg size:', sum(len(se) for se in ses) / len(ses)

> Also, note that if you explicitly compile C++ versions of your
> messages and link them into the process, they'll be even faster.  (If you
> don't, the library falls back to DynamicMessage which is not as fast as
> generated code.)

I'm trying to decipher that last hint, but having some trouble - what
exactly do you mean / how do I do that? I'm just using protoc
--py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.

> As for when 2.4.0 might be released, it's hard to say.  There's a lot of
> work to do, and we have a new person doing this release so he has to learn
> the process.  Also, holidays are coming up.  So, I'd guess it will be ready
> sometime in January.

Thanks for the estimate; even a ballpark without commitment is useful.

-- 
Yang Zhang
http://yz.mit.edu/

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Issue 210 in protobuf: Java code should detect incompatible runtime library version

2010-12-07 Thread protobuf

Updates:
Cc: liuj...@google.com

Comment #13 on issue 210 by liuj...@google.com: Java code should detect  
incompatible runtime library version

http://code.google.com/p/protobuf/issues/detail?id=210

Hmm, not very clear about the maven version compatibility... So suppose you  
have a pom.xml:




  
com.google.protobuf
protobuf-java
2.3
  



will it also accept protobuf-java-2.2.0.jar ? Or did you simply write  
2?


--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Any Python wrappers for the C++ implementation?

2010-12-07 Thread Kenton Varda
On Tue, Dec 7, 2010 at 9:19 PM, Yang Zhang  wrote:

> > Also, note that if you explicitly compile C++ versions of your
> > messages and link them into the process, they'll be even faster.  (If you
> > don't, the library falls back to DynamicMessage which is not as fast as
> > generated code.)
>
> I'm trying to decipher that last hint, but having some trouble - what
> exactly do you mean / how do I do that? I'm just using protoc
> --py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.
>

I'm not completely sure what I mean, because I don't have much experience
with Python C Extensions.  Basically I'm saying you should additionally
generate C++ code using protoc, the compile that into a C extension (even
with no interface), and then load it into your Python process.  Simply
having the C++ code for your message types present will make them faster.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Issue 210 in protobuf: Java code should detect incompatible runtime library version

2010-12-07 Thread protobuf


Comment #14 on issue 210 by aantono: Java code should detect incompatible  
runtime library version

http://code.google.com/p/protobuf/issues/detail?id=210

It would if you specify, what Maven calls, a range -> [1.0,2.0), which  
would basically read that anything inclusive between 1.0 and 2.0, but  
excluding the 2.0 itself.  If you just leave it as 2.3, that is what Maven  
calls, a "Soft" requirement on 2.3 (just a recommendation - helps select  
the correct version if it matches all ranges), basically means that you  
would like 2.3 if possible, but maven will do its best to try to select the  
nearest (by default within the minor range (keeping the same major))  
version that would match.


--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Any Python wrappers for the C++ implementation?

2010-12-07 Thread Yang Zhang
On Tue, Dec 7, 2010 at 9:40 PM, Kenton Varda  wrote:
> On Tue, Dec 7, 2010 at 9:19 PM, Yang Zhang  wrote:
>>
>> > Also, note that if you explicitly compile C++ versions of your
>> > messages and link them into the process, they'll be even faster.  (If
>> > you
>> > don't, the library falls back to DynamicMessage which is not as fast as
>> > generated code.)
>>
>> I'm trying to decipher that last hint, but having some trouble - what
>> exactly do you mean / how do I do that? I'm just using protoc
>> --py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.
>
> I'm not completely sure what I mean, because I don't have much experience
> with Python C Extensions.  Basically I'm saying you should additionally
> generate C++ code using protoc, the compile that into a C extension (even
> with no interface), and then load it into your Python process.  Simply
> having the C++ code for your message types present will make them faster.

Ah, my understanding now is that:

- Python code ordinarily (without
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses pure Python
(generated code) to parse/serialize messages.

- Python code *with* PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses
generic C++ code that dynamically parses/serializes messages (via
DynamicMessage/reflection), as opposed to using any pre-generated C++
code.

- Python code with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp actually
also *searches for the symbols for any pre-generated C++ code in the
current process*, and uses them if available instead of
DynamicMessage...? (This is via some global DescriptorPool magic?)

Sounds like pretty weird behavior, but indeed, now I get even faster
processing. The following run shows ~68x and ~13x speedups vs. ~15x
and ~8x (my original speedup calculations were ~15x and ~8x, not ~12x
and ~7x...not sure how I got those, I probably was going off a
different set of measurements):

$ PYTHONPATH=build/lib.linux-x86_64-2.6/:$PYTHONPATH
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python sandbox/pbbench.py
out.ini
noop: 1.6188621521e-07
ser: 6.39575719833e-06
parse: 4.55250144005e-05
msg size: 10730

This was simple to do. I added a C extension to my setup.py:

<<<
setup(
...
ext_modules=[Extension('podpb',
sources=['cpp/podpb.c','cpp/main.pb.cc'], libraries=['protobuf'])],
...
)
>>>

Generate the second source file with `protoc --cpp_out=cpp`, and
create the first one to set up an empty Python module:

<<<
#include 

static PyMethodDef PodMethods[] = {
  {NULL, NULL, 0, NULL}/* Sentinel */
};

PyMODINIT_FUNC
initpodpb(void)
{
  PyObject *m;

  m = Py_InitModule("podpb", PodMethods);
  if (m == NULL)
return;
}
>>>

Now `python setup.py build` should build everything. Just import the
module (podpb in our case) and you're good.

Awesome tip, thanks Kenton. I foresee additions to the documentation
in protobuf's near future :)
--
Yang Zhang
http://yz.mit.edu/

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.