Re: [protobuf] Re: Discrepancy in generated code of unsafe_arena_release_foo

2020-05-05 Thread 'Josh Haberman' via Protocol Buffers
On Tue, May 5, 2020 at 4:20 PM King Kong  wrote:

> Thanks for your reply.
>
> Yes, I have been using arena, but I wanted to do better than batching heap
> allocation, when possible. The "scoped leasing" use case I described above
> achieve 0 heap allocation all the time. I guess my code could use
> GetArena() at runtime to decide how the "leasing" object is created and
> which set of set&release APIs to use.
>

If you want to avoid any heap allocation, you could give the arena an
initial block of memory from the stack.

If you are trying to optimize, keep in mind that most of the savings from
arenas come at deallocation time, where arena-allocated protos can simply
skip the destructor. If you are using heap-allocated protos, the destructor
has to check every message field to see if it needs to be deleted.

Arenas have given such performance benefits that we are focused on that use
case. If we loosen the checks in unsafe_arena functions, it could force us
to compromise later to keep supporting the heap-allocated case. I don't
think we want to loosen these restrictions.


> On Tuesday, May 5, 2020 at 2:25:05 PM UTC-7, Josh Haberman wrote:
>>
>> The documentation for unsafe_arena_release_foo() says:
>>
>> > This method should only be used when the parent message is on the arena
>> and the submessage is on the same arena, or an arena with equivalent
>> lifetime.
>>
>> The DCHECK() you highlighted is checking the first part of this
>> requirement.
>>
>> Stack/heap-allocated messages must always uniquely own their
>> sub-messages, which must always be on the heap. Protobuf doesn't support
>> the use case you are describing, where a message temporarily points to a
>> message it doesn't own. If we allowed that, protobuf itself would crash if
>> a message in this state was ever destroyed.
>>
>> If you are trying to avoid heap allocation overhead, the recommended
>> solution is arenas.
>>
>> On Tuesday, May 5, 2020 at 12:44:32 PM UTC-7, King Kong wrote:
>>>
>>> I am using unsafe_arena_set_allocated_* and unsafe_arena_release_* APIs
>>> to avoid
>>> unnecessary allocations when the sub-objects are only used in a function
>>> scope,
>>> unsafe_arena_release_* is always called before leaving the scope. At the
>>> beginning,
>>>
>>> I chose to always use the arena versions of unsafe APIs because the
>>> non-arena version
>>>
>>> unsafe_release does extra allocation when the object is allocated on
>>> arena, and the arena
>>>
>>> version does not check if the object is allocated in arena, based on
>>> generated code like
>>>
>>> the following:
>>>
>>>
>>> inline foo* outter1::unsafe_arena_release_foo() {
>>> foo* temp = foo_;
>>> foo_ = nullptr;
>>> return temp;
>>> }
>>>
>>>
>>> However, I later also saw the following the generated code with arena
>>> check, it means it is not okay to call for heap-allocated objects.
>>>
>>> inline void outter2::unsafe_arena_set_allocated_bar(
>>> std::string* bar) {
>>> GOOGLE_DCHECK(GetArenaNoVirtual() != nullptr);
>>> ... ...
>>> }
>>>
>>>
>>> So, why the discrepancy?
>>>
>>>
>>> The official documentation at
>>> https://developers.google.com/protocol-buffers/docs/reference/arenas says
>>> the correct set of APIs should be used
>>>
>>> according to how to the object is allocated, but having two different
>>> sets of unsafe APIs makes it harder to use, and it makes the code less
>>> robust to future changes. I am wondering if protobuf owners also realize
>>> such an issue and start to remove arena checking gradually? It that the
>>> case? I would like to hear how others think.
>>>
>>>
>>> Thanks.
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Protocol Buffers" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/protobuf/C26jc_jrCJQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> protobuf+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/protobuf/bfa10586-05af-430c-b809-5e4c65d43b5c%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/CAFFe09gqZJJ40xvdbTNRqQ9%3DaJYxud%3Dtyufg-reYajTOT%2BPDsw%40mail.gmail.com.


Re: [protobuf] Re: Discrepancy in generated code of unsafe_arena_release_foo

2020-05-05 Thread King Kong
As a related issue, I remember reading some comments in protobuf source 
code that says google internal protobuf supports arena allocation that's 
not portable (probably assumes certain std::string implementation), thus it 
is not open sourced, what's the plan for string arena allocation support? 
If I don't care about portability, is it possible to get the non-portable 
solution somehow?

On Tuesday, May 5, 2020 at 7:02:39 PM UTC-7, Josh Haberman wrote:
>
> On Tue, May 5, 2020 at 4:20 PM King Kong > 
> wrote:
>
>> Thanks for your reply.
>>
>> Yes, I have been using arena, but I wanted to do better than batching 
>> heap allocation, when possible. The "scoped leasing" use case I described 
>> above achieve 0 heap allocation all the time. I guess my code could use 
>> GetArena() at runtime to decide how the "leasing" object is created and 
>> which set of set&release APIs to use.
>>
>
> If you want to avoid any heap allocation, you could give the arena an 
> initial block of memory from the stack.
>
> If you are trying to optimize, keep in mind that most of the savings from 
> arenas come at deallocation time, where arena-allocated protos can simply 
> skip the destructor. If you are using heap-allocated protos, the destructor 
> has to check every message field to see if it needs to be deleted.
>
> Arenas have given such performance benefits that we are focused on that 
> use case. If we loosen the checks in unsafe_arena functions, it could force 
> us to compromise later to keep supporting the heap-allocated case. I don't 
> think we want to loosen these restrictions.
>  
>
>> On Tuesday, May 5, 2020 at 2:25:05 PM UTC-7, Josh Haberman wrote:
>>>
>>> The documentation for unsafe_arena_release_foo() says:
>>>
>>> > This method should only be used when the parent message is on the 
>>> arena and the submessage is on the same arena, or an arena with equivalent 
>>> lifetime.
>>>
>>> The DCHECK() you highlighted is checking the first part of this 
>>> requirement.
>>>
>>> Stack/heap-allocated messages must always uniquely own their 
>>> sub-messages, which must always be on the heap. Protobuf doesn't support 
>>> the use case you are describing, where a message temporarily points to a 
>>> message it doesn't own. If we allowed that, protobuf itself would crash if 
>>> a message in this state was ever destroyed.
>>>
>>> If you are trying to avoid heap allocation overhead, the recommended 
>>> solution is arenas.
>>>
>>> On Tuesday, May 5, 2020 at 12:44:32 PM UTC-7, King Kong wrote:

 I am using unsafe_arena_set_allocated_* and unsafe_arena_release_* APIs 
 to avoid
 unnecessary allocations when the sub-objects are only used in a 
 function scope,
 unsafe_arena_release_* is always called before leaving the scope. At 
 the beginning,

 I chose to always use the arena versions of unsafe APIs because the 
 non-arena version 

 unsafe_release does extra allocation when the object is allocated on 
 arena, and the arena

 version does not check if the object is allocated in arena, based on 
 generated code like 

 the following:


 inline foo* outter1::unsafe_arena_release_foo() {
 foo* temp = foo_;
 foo_ = nullptr;
 return temp;
 }


 However, I later also saw the following the generated code with arena 
 check, it means it is not okay to call for heap-allocated objects.

 inline void outter2::unsafe_arena_set_allocated_bar(
 std::string* bar) {
 GOOGLE_DCHECK(GetArenaNoVirtual() != nullptr);
 ... ...
 }


 So, why the discrepancy?


 The official documentation at 
 https://developers.google.com/protocol-buffers/docs/reference/arenas says 
 the correct set of APIs should be used

 according to how to the object is allocated, but having two different 
 sets of unsafe APIs makes it harder to use, and it makes the code less 
 robust to future changes. I am wondering if protobuf owners also realize 
 such an issue and start to remove arena checking gradually? It that the 
 case? I would like to hear how others think.


 Thanks.

>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Protocol Buffers" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/protobuf/C26jc_jrCJQ/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> prot...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/protobuf/bfa10586-05af-430c-b809-5e4c65d43b5c%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving e

Re: [protobuf] Re: Discrepancy in generated code of unsafe_arena_release_foo

2020-05-07 Thread King Kong
I guess my last post wasn't clear enough, so here is the clarification.

With open source version of protobuf, std::string cannot be allocated in 
arena, so set allocated and release APIs are necessary for temporarily 
sharing std::string pointers, according to official documentation and 
generated code, different APIs are needed (unsafe_arena_* vs non-arena), 
depending on how the "borrowing" object is allocated, this leads to fragile 
code. Allowing using one set of APIs in both cases seems a better 
alternative.

If the recommendation is not to use std::string, what's the replacement?
Thanks.

On Tuesday, May 5, 2020 at 8:33:21 PM UTC-7, King Kong wrote:
>
> As a related issue, I remember reading some comments in protobuf source 
> code that says google internal protobuf supports arena allocation that's 
> not portable (probably assumes certain std::string implementation), thus it 
> is not open sourced, what's the plan for string arena allocation support? 
> If I don't care about portability, is it possible to get the non-portable 
> solution somehow?
>
> On Tuesday, May 5, 2020 at 7:02:39 PM UTC-7, Josh Haberman wrote:
>>
>> On Tue, May 5, 2020 at 4:20 PM King Kong  wrote:
>>
>>> Thanks for your reply.
>>>
>>> Yes, I have been using arena, but I wanted to do better than batching 
>>> heap allocation, when possible. The "scoped leasing" use case I described 
>>> above achieve 0 heap allocation all the time. I guess my code could use 
>>> GetArena() at runtime to decide how the "leasing" object is created and 
>>> which set of set&release APIs to use.
>>>
>>
>> If you want to avoid any heap allocation, you could give the arena an 
>> initial block of memory from the stack.
>>
>> If you are trying to optimize, keep in mind that most of the savings from 
>> arenas come at deallocation time, where arena-allocated protos can simply 
>> skip the destructor. If you are using heap-allocated protos, the destructor 
>> has to check every message field to see if it needs to be deleted.
>>
>> Arenas have given such performance benefits that we are focused on that 
>> use case. If we loosen the checks in unsafe_arena functions, it could force 
>> us to compromise later to keep supporting the heap-allocated case. I don't 
>> think we want to loosen these restrictions.
>>  
>>
>>> On Tuesday, May 5, 2020 at 2:25:05 PM UTC-7, Josh Haberman wrote:

 The documentation for unsafe_arena_release_foo() says:

 > This method should only be used when the parent message is on the 
 arena and the submessage is on the same arena, or an arena with equivalent 
 lifetime.

 The DCHECK() you highlighted is checking the first part of this 
 requirement.

 Stack/heap-allocated messages must always uniquely own their 
 sub-messages, which must always be on the heap. Protobuf doesn't support 
 the use case you are describing, where a message temporarily points to a 
 message it doesn't own. If we allowed that, protobuf itself would crash if 
 a message in this state was ever destroyed.

 If you are trying to avoid heap allocation overhead, the recommended 
 solution is arenas.

 On Tuesday, May 5, 2020 at 12:44:32 PM UTC-7, King Kong wrote:
>
> I am using unsafe_arena_set_allocated_* and unsafe_arena_release_* 
> APIs to avoid
> unnecessary allocations when the sub-objects are only used in a 
> function scope,
> unsafe_arena_release_* is always called before leaving the scope. At 
> the beginning,
>
> I chose to always use the arena versions of unsafe APIs because the 
> non-arena version 
>
> unsafe_release does extra allocation when the object is allocated on 
> arena, and the arena
>
> version does not check if the object is allocated in arena, based on 
> generated code like 
>
> the following:
>
>
> inline foo* outter1::unsafe_arena_release_foo() {
> foo* temp = foo_;
> foo_ = nullptr;
> return temp;
> }
>
>
> However, I later also saw the following the generated code with arena 
> check, it means it is not okay to call for heap-allocated objects.
>
> inline void outter2::unsafe_arena_set_allocated_bar(
> std::string* bar) {
> GOOGLE_DCHECK(GetArenaNoVirtual() != nullptr);
> ... ...
> }
>
>
> So, why the discrepancy?
>
>
> The official documentation at 
> https://developers.google.com/protocol-buffers/docs/reference/arenas says 
> the correct set of APIs should be used
>
> according to how to the object is allocated, but having two different 
> sets of unsafe APIs makes it harder to use, and it makes the code less 
> robust to future changes. I am wondering if protobuf owners also realize 
> such an issue and start to remove arena checking gradually? It that the 
> case? I would like to hear how others think.
>
>
> Thanks.
>
 -- 
>>> You rec

Re: [protobuf] Re: Discrepancy in generated code of unsafe_arena_release_foo

2020-05-07 Thread 'Josh Haberman' via Protocol Buffers
The unsafe_arena_* functions for strings are deprecated:

> When arenas are enabled, string and bytes fields generate
unsafe_arena_release_field() and unsafe_arena_set_allocated_field()
methods. Note that these methods are deprecated and will be removed in a
future release. Those methods were added by mistake and offer no
performance advantage over their safe equivalents.

https://developers.google.com/protocol-buffers/docs/reference/arenas#arenastring

We do not have any supported interface for borrowing a std::string and
releasing it later. Our goal in this area is to reduce (and eventually
remove) the use of std::string in generated APIs in favor of string_view,
which could potentially support an accessor like:

// The "foo" field will point to the given string data, which must outlive
the message.
void FooMessage::set_alias_foo(string_view str) {}

On Thu, May 7, 2020 at 11:10 AM King Kong  wrote:

> I guess my last post wasn't clear enough, so here is the clarification.
>
> With open source version of protobuf, std::string cannot be allocated in
> arena, so set allocated and release APIs are necessary for temporarily
> sharing std::string pointers, according to official documentation and
> generated code, different APIs are needed (unsafe_arena_* vs non-arena),
> depending on how the "borrowing" object is allocated, this leads to fragile
> code. Allowing using one set of APIs in both cases seems a better
> alternative.
>
> If the recommendation is not to use std::string, what's the replacement?
> Thanks.
>
> On Tuesday, May 5, 2020 at 8:33:21 PM UTC-7, King Kong wrote:
>>
>> As a related issue, I remember reading some comments in protobuf source
>> code that says google internal protobuf supports arena allocation that's
>> not portable (probably assumes certain std::string implementation), thus it
>> is not open sourced, what's the plan for string arena allocation support?
>> If I don't care about portability, is it possible to get the non-portable
>> solution somehow?
>>
>> On Tuesday, May 5, 2020 at 7:02:39 PM UTC-7, Josh Haberman wrote:
>>>
>>> On Tue, May 5, 2020 at 4:20 PM King Kong  wrote:
>>>
 Thanks for your reply.

 Yes, I have been using arena, but I wanted to do better than batching
 heap allocation, when possible. The "scoped leasing" use case I described
 above achieve 0 heap allocation all the time. I guess my code could use
 GetArena() at runtime to decide how the "leasing" object is created and
 which set of set&release APIs to use.

>>>
>>> If you want to avoid any heap allocation, you could give the arena an
>>> initial block of memory from the stack.
>>>
>>> If you are trying to optimize, keep in mind that most of the savings
>>> from arenas come at deallocation time, where arena-allocated protos can
>>> simply skip the destructor. If you are using heap-allocated protos, the
>>> destructor has to check every message field to see if it needs to be
>>> deleted.
>>>
>>> Arenas have given such performance benefits that we are focused on that
>>> use case. If we loosen the checks in unsafe_arena functions, it could force
>>> us to compromise later to keep supporting the heap-allocated case. I don't
>>> think we want to loosen these restrictions.
>>>
>>>
 On Tuesday, May 5, 2020 at 2:25:05 PM UTC-7, Josh Haberman wrote:
>
> The documentation for unsafe_arena_release_foo() says:
>
> > This method should only be used when the parent message is on the
> arena and the submessage is on the same arena, or an arena with equivalent
> lifetime.
>
> The DCHECK() you highlighted is checking the first part of this
> requirement.
>
> Stack/heap-allocated messages must always uniquely own their
> sub-messages, which must always be on the heap. Protobuf doesn't support
> the use case you are describing, where a message temporarily points to a
> message it doesn't own. If we allowed that, protobuf itself would crash if
> a message in this state was ever destroyed.
>
> If you are trying to avoid heap allocation overhead, the recommended
> solution is arenas.
>
> On Tuesday, May 5, 2020 at 12:44:32 PM UTC-7, King Kong wrote:
>>
>> I am using unsafe_arena_set_allocated_* and unsafe_arena_release_*
>> APIs to avoid
>> unnecessary allocations when the sub-objects are only used in a
>> function scope,
>> unsafe_arena_release_* is always called before leaving the scope. At
>> the beginning,
>>
>> I chose to always use the arena versions of unsafe APIs because the
>> non-arena version
>>
>> unsafe_release does extra allocation when the object is allocated on
>> arena, and the arena
>>
>> version does not check if the object is allocated in arena, based on
>> generated code like
>>
>> the following:
>>
>>
>> inline foo* outter1::unsafe_arena_release_foo() {
>> foo* temp = foo_;
>> foo_ = nullptr;
>> retur