Slicing support in Python
(Okay, back on track) On Tue, Dec 2, 2008 at 11:17 PM, Kenton Varda [EMAIL PROTECTED] wrote: On Tue, Dec 2, 2008 at 11:08 PM, Alek Storm [EMAIL PROTECTED] wrote: I would think encoding and decoding would be the main bottlenecks, so can't those be wrappers around C++, while let object handling (reflection.py and friends) be pure-python? It seems like the best of both worlds. Well, the generated serializing and parsing code in C++ is an order of magnitude faster than the dynamic (reflection-based) code. But to use generated code you need to be using C++ object handling. Not if you decouple them. Abstractly, the C++ parser receives a serialized message and descriptor and returns a tree of the form [(tag_num, value)] where tag_num is an integer and value is either a scalar or a subtree (for submessages). The Python reflection code takes the tree and fills the message object with its values. It's simple, fast, and the C++ parser can be easily swapped out for a pure-Python one on systems that don't support the C++ version. Run this backwards when serializing, and you get another advantage: you can easily swap out the function that converts the tree into serialized protobuf for one that outputs XML, JSON, etc. You're right. If it's a waste of time for them, most people won't use it. But if there's no point to it, why do normal Python lists have it? It's useful enough to be included there. And since repeated fields act just like lists, it should be included here too. I think Python object lists are probably used in a much wider variety of ways than protocol buffer repeated fields generally are. Let's include it - it gives us a more complete list interface, there's no downside, and the users can decide whether they want to use it. We can't predict all possible use cases. In fact, it doesn't even have to be useful for repeated composites. The fact that repeated scalars have it means it's automatically included for repeated composites, because they should have the exact same interface. Polymorphism is what we want here. But they already can't have the same interface because append() doesn't work. :) We don't have confirmation on that yet ;). Having the same interface is what we should be shooting for. Thanks, Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Slicing support in Python
Hi Kenton and Petar, Sorry I haven't been able to reply for a few days; I've been so swamped this week. Hopefully I'll be able to conjure up an intelligent reply tomorrow :) Cheers, Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Slicing support in Python
On Wed, Dec 3, 2008 at 5:32 AM, Kenton Varda [EMAIL PROTECTED] wrote: Sorry, I think you misunderstood. The C++ parsers generated by protoc (with optimize_for = SPEED) are an order of magnitude faster than the dynamic *C++* parser (used with optimize_for = CODE_SIZE and DynamicMessage). The Python parser is considerably slower than either of them, but that's beside the point. Your decoupled parser which produces a tag/value tree will be at least as slow as the existing C++ dynamic parser, probably slower (since it sounds like it would use some sort of dictionary structure rather than flat classes/structs). Oh, I forgot we have two C++ parsers. The method I described uses the generated (SPEED) parser, so it should be a great deal quicker. It just outputs a tree instead of a message, leaving the smart object creation to Python. Run this backwards when serializing, and you get another advantage: you can easily swap out the function that converts the tree into serialized protobuf for one that outputs XML, JSON, etc. You can already easily write encoders and decoders for alternative formats using reflection. Honestly, I think using reflection for something as basic as changing the ouput format is hackish and could get ugly. Reflection should only be used in certain circumstances, e.g., generating message objects, because it exposes the internals. There's a chance we could change how Protocol Buffers works under the hood in a way that screws up an XML outputter, which wouldn't happen if we just expose a clean interface. Let's include it - it gives us a more complete list interface, there's no downside, and the users can decide whether they want to use it. We can't predict all possible use cases. Ah, yes, the old Why not? argument. :) Actually, I far prefer the opposite argument: If you aren't sure if someone will want a feature, don't include it. There is always a down side to including a feature. Even if people choose not to use it, it increases code size, maintenance burden, memory usage, and interface complexity. Worse yet, if people do use it, then we're permanently stuck with it, whether we like it or not. We can't change it later, even if we decide it's wrong. For example, we may decide later -- based on an actual use case, perhaps -- that it would really have been better if remove() compared elements by content rather than by identity, so that you could remove a message from a repeated field by constructing an identical message and then calling remove(). But we wouldn't be able to change it. We'd have to instead add a different method like removeByValue(), which would be ugly and add even more complexity. Protocol Buffers got where they are by stubbornly refusing the vast majority of feature suggestions. :) Ha, I thought you might say that. It's a good philosophy, and I completely understand where you're coming from. So I concede that point, and it all boils down to complete interface vs. compact interface. But just for the record, I'm pretty sure Python's list remove() method compares by value, and doesn't have a method that compares by identity. So there would be no reason to include a compare-by-identity method in protobuf repeated fields. That said, you do have a good point that the interface should be similar to standard Python lists if possible. But given the other problems that prevent this, it seems like a moot point. Okay, you place more value on compact interface. So are we keeping remove() for scalar values? I think their interfaces should be consistent, but I don't think you think that's as important. On Wed, Dec 3, 2008 at 10:25 AM, Petar Petrov [EMAIL PROTECTED]wrote: It's not that simple. We would also like to improve performance at least in MergeFrom/CopyFrom/ParseASCII/IsInitialized. Okay. So let's say we have a pure-C++ parser with a Python wrapper. This brings us back to getting slicing to work in C++ with no garbage collector. Kenton, could you elaborate on what you meant earlier by ownership problems specific to the C++ version? I can't really see anything that would affect PB repeated fields that isn't taken care of by handing the user control over allocation and deallocation of the field elements. Currently each composite field has a reference to its parent. This makes it impossible to add the same composite to two different repeated composite fields. The .add() method guarantees that this never happens. Is there anything wrong with having a list of parents? I'm guessing I'm being naive - would speed be affected too much by that? I think protobuf's repeated composite fields aren't and shouldn't be equivalent to python lists. Okay, that's cleared up now. Thanks. Cheers, Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf
Re: Slicing support in Python
On Sat, Dec 6, 2008 at 12:42 AM, Kenton Varda [EMAIL PROTECTED] wrote: On Fri, Dec 5, 2008 at 10:59 PM, Alek Storm [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:32 AM, Kenton Varda [EMAIL PROTECTED] wrote: Sorry, I think you misunderstood. The C++ parsers generated by protoc (with optimize_for = SPEED) are an order of magnitude faster than the dynamic *C++* parser (used with optimize_for = CODE_SIZE and DynamicMessage). The Python parser is considerably slower than either of them, but that's beside the point. Your decoupled parser which produces a tag/value tree will be at least as slow as the existing C++ dynamic parser, probably slower (since it sounds like it would use some sort of dictionary structure rather than flat classes/structs). Oh, I forgot we have two C++ parsers. The method I described uses the generated (SPEED) parser, so it should be a great deal quicker. It just outputs a tree instead of a message, leaving the smart object creation to Python. No, the static (SPEED) parser parses to generated C++ objects. It doesn't make sense to say that we'll use the static parser to parse to this abstract tree structure, because the whole point of the static parser is that it parses to concrete objects. If it didn't, it wouldn't be so fast. (In fact, the biggest bottleneck in protobuf parsing is memory bandwidth, and I can't see how your tree structure would be anywhere near as compact as a generated message class.) Gah, you're right. I was thinking of it the wrong way. I still kinda like it, but since apparently the abstraction required would negate the speed increase, I guess it's time to drop it. Honestly, I think using reflection for something as basic as changing the ouput format is hackish and could get ugly. I think you're thinking of a different kind of reflection. I'm talking about the google::protobuf::Reflection interface. The whole point of this interface is to allow you to do things like write custom codecs for things like JSON or XML. Take a look at text_format.cc for an example usage. Ah. I wasn't as familiar with the C++ version as I thought. Still, I thought it would be cool to have PB/XML/JSON/etc outputters operate at the same level. If a message Foo has a repeated field of type Bar, then the Bar objects in that field are owned by Foo. When you delete Foo, all the Bars are deleted. Leaving it up to the user to delete the Bar objects themselves is way too much of a burden. But it does give us a lot of cool functionality, like adding the same message to two parents, and (yes!) slicing support. I thought this was common practice in C++, but it's been quite a while since I've coded it. Is there anything wrong with having a list of parents? I'm guessing I'm being naive - would speed be affected too much by that? Way too complicated, probably a lot of overhead, and not very useful in practice. Is it really that useful to have ByteSize() cached for repeated fields? If it's not, we get everything I mentioned above for free. I'm genuinely not sure - it only comes up when serializing the message in wire_format.py. What do you think? Cheers, Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Slicing support in Python
On Mon, Dec 8, 2008 at 1:16 PM, Kenton Varda [EMAIL PROTECTED] wrote: On Sat, Dec 6, 2008 at 1:03 AM, Alek Storm [EMAIL PROTECTED] wrote: Is it really that useful to have ByteSize() cached for repeated fields? If it's not, we get everything I mentioned above for free. I'm genuinely not sure - it only comes up when serializing the message in wire_format.py. What do you think? Yes, it's just as necessary as it is with optional fields. The main problem is that the size of a message must be written before the message contents itself. If, while serializing, you call ByteSize() to get this size every time you write a message, then you'll end up computing the size of deeply-nested messages many times (once for each outer message within which they're nested). Caching avoids that problem. Okay, then we just need to cache the size only during serialization. The children's sizes are calculated and stored, then added to the parent's size. Write the parent size, then write the parent, then the child size, then the child, on down the tree. Then it's O(n) (same as we have currently) and no ownership problems, because we can drop the weak reference from child to parent. Would that work? Cheers, Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Slicing support in Python
On Sat, Dec 13, 2008 at 5:09 PM, Petar Petrov pesho.pet...@gmail.comwrote: On Mon, Dec 8, 2008 at 5:36 PM, Alek Storm alek.st...@gmail.com wrote: Okay, then we just need to cache the size only during serialization. The children's sizes are calculated and stored, then added to the parent's size. Write the parent size, then write the parent, then the child size, then the child, on down the tree. Then it's O(n) (same as we have currently) and no ownership problems, because we can drop the weak reference from child to parent. Would that work? It may work, but ByteSize is a part of the public interface of the message, so making it slower may not be a good idea. However the parent reference will still be needed. file.py: m3 = M3() m3.m2.m1.i = 3 m3.HasField('m2') # should be True How does m3 know if m2 was set? This information is right now provided by the setter of 'i' in m1 (by calling TransitionToNonEmpty on the parent, which calls TransitionToNonEmpty on its parent and so on). Oops, I wasn't clear. Of course HasField should work for non-repeated fields; I only meant to get rid of the weak reference when the message's parent is a repeated composite field, because HasField isn't used for those, so we don't need it if we cache the size only during serialization. So we get a bunch of benefits in exchange for making a rarely used part of the interface slower, and only when used outside of the internal serialization functions. What do you think? So the parent references are still needed. Let's keep the slice assignment of repeated scalar fields and just remove the slice assignment of repeated composite fields (I still don't find it usefull). E.g. we can keep __getslice__, __delitem__ and __delslice__ for repeated composite fields. Okay, I'll submit a patch with just those methods. We can definitely agree on that :). The above discussion is separate. By the way, I think something went wrong with your email - apparently it was sent to the group, but didn't show up there, so I just now found it in my inbox. Weird. Cheers, Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Generating .proto files from C++ classes
On Jan 7, 4:21 pm, Kenton Varda ken...@google.com wrote: SWIG translates C/C++ APIs into other programming languages. Protocol Buffers is not a programming language, so I don't see the analogy. What would be the protocol buffer equivalent of a C function or a C++ class? Technically, SWIG generates wrappers around C/C++ APIs from header files so higher-level languages can call them. This is roughly what he wants to do for Protocol Buffers: generate .proto schemas from C++ header files. He might want to look into extending the Boost Serialization library somehow. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Generating .proto files from C++ classes
On Jan 7, 8:18 pm, Kenton Varda ken...@google.com wrote: IMO, there's not much reason to use the protobuf wire format unless you explicitly intend for some users to read/write the format using actual protocol buffers. Not entirely sure what you mean. This will probably get a lot clearer once we get Mike's requirements. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: The way to set value to extension in Protocol buffers
On Jan 9, 1:32 am, chongyc chongy...@gmail.com wrote: I defined proto by following . message Ch { required int32 offset = 1; } message Foo { required string cmd = 1; extensions 1000 to max; } message Bar { extend Foo { required Ch ck = 1000; } } Then I generated c++ code from it. It's OK I am going to set value to extension ck() I compiled c++ source following below. term2relay::Foo f; term2relay::Bar b; term2relay::Ch c; c.set_offset (100); f.SetExtension(Bar::ck, c); But compiling failed in source code line f.SetExtension(Bar::ck, c); How to do it ? Please help me. You don't need to create a Ch instance yourself - it's done for you when you create a new Foo instance, which is analogous to the same situation without extensions, but in this case, you need to get a mutable pointer to your Ch instance. Try this: term2relay::Foo f; term2relay::Ch* c = f.MutableExtension(Bar::ck); c-set_offset(100); You also need to change ck's label to 'optional'; otherwise you get an assertion error. Kenton, do all composite fields in extensions have to be optional? If so, can we document this? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: POSIX long command line arguments
I'm for changing it. Command line flags get deprecated in software all the time. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Packed option for repeated fields?
Just to clarify (because I can't find this addressed anywhere else), the length delimeter for repeated fields will be the byte count of the whole array, not the count of its elements, right? So an array of 3 fixed32's would have length 12, not 3. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Data Tunnel from C++ to Python
I'd love to help, but I need more detail about exactly what each program does. Is the C++ the backend, while Python is the frontend? What are any inputs/outputs? Because right now I don't see any need to do inter-process communication, or have separate processes at all - it seems like one just needs to have its interface exposed to the other. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Data Tunnel from C++ to Python
On Jan 27, 1:17 pm, Topher Brown topher...@gmail.com wrote: I'm not writing a finished program- I'm trying to write a tool for our use while we are developing. We are working with large arrays (stored in protobuf) in C++ and want to plot them, using various python tools, while we are developing- to find bugs and other abnormalities in the data. So the idea is to be able to call this tool to send the data to python, then decide on how to plot it. That still sounds like a great candidate for embedding Python. Why do you say you can't do it? You can easily expose C++ message objects to Python code using Boost, or serialize the C++ object, pass the data over the C++/Python boundary, then deserialize it on the other side. However, if you're dead set on IPC, I checked, and Windows does indeed have FIFOs. Sorry if I'm not making myself clear, but I appreciate your help. No problem :). I just wanted to fully understand what you were trying to do. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Serialization performance comparison with Boost.serialization
I think Yingfeng is referring to the archive formats described here: http://www.boost.org/doc/libs/1_38_0/libs/serialization/doc/archives.html#archive_models. The binary format, however, appears to be non-portable, so it doesn't seem to serve the same purpose as Protocol Buffers, and should be faster anyway, since it encodes directly to native types. -- Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: New line characters removal
I think you're both right. Marc is talking about fields of type 'string' that happen to include newlines, because that's what the user put in them. I'd guess the reason Shirish is seeing newlines all over is because he has a field optional string foo = 1; somewhere, which gets keyed to 1010, or decimal 10, when encoded, then followed by the field's value. And who knows, that value could contain newlines :) -- Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Forward/backwards compatibility - slightly OT
Could you clarify a little more? I'd be happy to help, but you have tried Googling this, right? On Mon, Jun 8, 2009 at 6:54 PM, Alain M. ala...@pobox.com wrote: Hi, One of the big advantages of ProtBuf is the ability to make comunications Forward *and* backward compatible beween versions. I would like to study the matter a little more, preferably not directly related to PB, but in a neutral background (even XML could be). Can enyone send some reference about this topic? Thanks, Alain Mouette -- Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: JavaScript
That one doesn't look complete. I've got one that almost is. On Thu, Jun 25, 2009 at 3:53 PM, Marc Gravell marc.grav...@gmail.comwrote: I haven't tried it, but http://code.google.com/p/protobuf/wiki/OtherLanguages lists javascript support here: http://code.google.com/p/protobuf-js/ (this is one of many unofficial independent implementations - not google's; don't blame them ;-p) Marc -- Alek Storm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: where are the examples
Google uses its own internal RPC implementation, and I don't think we can endorse a particular third-party one as better than the others. I'd tell you which one I personally found most beneficial, but I have no experience with any of them. Cheers, Alek On Wed, Jul 1, 2009 at 10:18 AM, J.V. jvsr...@gmail.com wrote: thanks, which product(s) does Google use internally or find most beneficial? Kenton Varda wrote: You could look at one of the open source RPC implementations listed here: http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns#RPC_Implementations --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---