The context is that Message-typed fields (and string-typed fields) are encoded with an int32 of "number of bytes that follow this are this message". It's not possible to encode a message which is larger than that from a binary wire format technical point of view, and thats an ecosystem wide implication.
This leaves some wiggle room though, notably top level messages are not encoded with a length prefix, which means they don't have any such technical constraint. But also more notably, if you just construct a message in memory and then call some setters it will build up the arbitrarily large message. > May be a bit of context here would help, I am coming from the point of view https://groups.google.com/g/protobuf/c/vvP4uajRE60 > If the potential fix for it was to set limit to 2g in message_lite.c, Without other context and doing more archeology, I actually suspect the 'attack' was more that e.g. sufficiently smart attackers could send a string which is length "2GB minus one byte", and then know that the service boxes up that input in a protobuf message (adding a few bytes over overhead), and then encode that to the next backend server. And the fix C++ issue back then was not simply to try to enforce a conceptual limit on 2GB, it instead required changing the C++ API to use a `long` (int64) for the encoded size of messages instead of an int32 (the size getter is called `ByteSizeLong()`). That made it much easier to write correct behavior against 2GB limits; because when you have an `int EncodedLength()` function, once you e.g. have 10 strings that are each 512MB, set them all as separate fields on the same parent message, then try to see what the `int` serialized size should be, there's no way to handle it gracefully. By making it a `long` instead it is able to return the actual size without a 2GB limit, and then if you try to serialize a message where that size is too large it will fail to serialize (serialize has a bool return value on it). On Fri, Jul 4, 2025 at 10:40 AM 'Somak Dutta' via Protocol Buffers < [email protected]> wrote: > Exactly, could not agree more. There are current limit set to > Integer.MAX_VALUE in CodedInputStream > > May be a bit of context here would help, I am coming from the point of > view https://groups.google.com/g/protobuf/c/vvP4uajRE60 > > If the potential fix for it was to set limit to 2g in message_lite.c, in > memory safe language like Java it is anyways default to 2g. I wonder if the > vulnerability data in the world that marks java as impacted by the > vulnerability is really over estimating. > > ``` > > Somak Dutta > Jul 3, 2025, 1:51:24 PM (yesterday) > > > > to Protocol Buffers > Hi, > > I am writing to ask about vulnerability reported GHSA-jwvw-v7c5-m82h > <https://github.com/advisories/GHSA-jwvw-v7c5-m82h> for protobuf-java > <https://mvnrepository.com/artifact/com.google.protobuf/protobuf-java> which > specifically talks about "*protobuf allows remote authenticated attackers > to cause a heap-based buffer overflow.*" > > Specifically to ask about earlier versions < 3.4.0. > Take for example a version 2.5.0, based on all the code i see for > CodedInputStream > <https://github.com/protocolbuffers/protobuf/blob/v2.5.0/java/src/main/java/com/google/protobuf/CodedInputStream.java> > - methods such as readRawBytes/refillBuffer, which are performing either > copy to/from or resizing , are all pretty safe from integer overflows. > - there is also present a slow path, where we read buffer in chunks to > potentially prevent out of memory issues. > > First Question: > However i am not seeing any evidence where the package can be vulnerable > to a buffer overflows issues > Additionally given java is memory safe language i am failing to see how > java ecosystem is susceptible to the afore mentioned vulnerability. > > Second Question: > There is a question related / or along the same veins here > https://github.com/protocolbuffers/protobuf/issues/760?reload=1#issuecomment-847162817 > . > The potential fix also suggests issue might be present only in c/c++ > ecosystems. > > ``` > > Regards, > Somak > > On Friday, July 4, 2025 at 3:27:21 PM UTC+5:30 Cassondra Foesch wrote: > >> I’m pretty sure that since 2 GiB is the maximum value an int32 could >> carry, that is where the requirement is coming from. It’s entirely possible >> that it is not actually enforced across the whole ecosystem, but is >> essentially enforced by “if you exceed this boundary, some code will not >> work with your protobuf.” >> >> Like, for instance, it is impossible for a 32-bit Golang implementation >> do deal with more than 2 GiB data in a single slice. (Since the length of >> the slice is stored as a 32-bit signed integer.) >> >> Am Do., 3. Juli 2025 um 08:21 Uhr schrieb 'Somak Dutta' via Protocol >> Buffers <[email protected]>: >> >>> Hello, >>> >>> From https://protobuf.dev/programming-guides/proto-limits/ i understand >>> across all ecosystems >>> >>> Any proto in serialized form must be <2GiB, as that is the maximum size >>> supported by all implementations. It’s recommended to bound request and >>> response sizes. >>> >>> However wanted to check where exactly is the limitation set up, >>> specifically in protobuf-java library. >>> >>> I can see safe checks in only message_lite.cc files , but i dont think >>> this would be reflected across ecosystems? >>> >>> if (size > INT_MAX) { >>> GOOGLE_LOG(ERROR) << "Exceeded maximum protobuf size of 2GB: " << size; >>> return false; >>> } >>> >>> Regards >>> >>> *Confidentiality Notice: This email and any attachments are confidential >>> and intended solely for the use of the individual or entity to whom they >>> are addressed. If you have received this email in error, please notify the >>> sender immediately and delete it from your system. Unauthorized use, >>> disclosure, or copying of this email or its contents is strictly >>> prohibited.* >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Protocol Buffers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion visit >>> https://groups.google.com/d/msgid/protobuf/e0d724d8-2a45-4ef1-aaac-c3e6d1077306n%40googlegroups.com >>> <https://groups.google.com/d/msgid/protobuf/e0d724d8-2a45-4ef1-aaac-c3e6d1077306n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> > *Confidentiality Notice: This email and any attachments are confidential > and intended solely for the use of the individual or entity to whom they > are addressed. If you have received this email in error, please notify the > sender immediately and delete it from your system. Unauthorized use, > disclosure, or copying of this email or its contents is strictly > prohibited.* > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/protobuf/c75ea739-28b6-48fd-9394-3d13499d47ben%40googlegroups.com > <https://groups.google.com/d/msgid/protobuf/c75ea739-28b6-48fd-9394-3d13499d47ben%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/protobuf/CAKRmVH-FgxWQPwU3h7%2B%2Bk8OxsWk6upbdsYADh2hzFKi4DmugTg%40mail.gmail.com.
