Re: RFC: Allow moved-from strings to be non-empty
On Sat, 27 Oct 2018 at 01:27, Joe Buck wrote: > > The reason move constructors were introduced was to speed up code in cases > where an object > Is copied and the copy is no longer needed. It is unfortunate that there may > now be code out > there that relies on accidental properties of library implementations. It > would be best if the > Implementation is not constrained. Unless the standard mandates that, after > a string is moved, > the string is empty, the user should only be able to assume that it is in > some consistent but > unspecified state. Otherwise we pay a performance penalty forever. > > If the standard explicitly states that the argument to the move constructor > is defined to be > empty after the call, we're stuck. It certainly doesn't specify so for an SSO string, so we're not stuck. On the other hand, we already get a speed-up, it's just not as much of a speed-up as it can be. What I really loathe is the potential implementation divergence; it's all good for the priesthood to refer to the standard and say "you shouldn't have done that", but that's, as a good friend of mine provided as a phrasing on a different manner, spectacularly unhelpful.
RE: RFC: Allow moved-from strings to be non-empty
The reason move constructors were introduced was to speed up code in cases where an object Is copied and the copy is no longer needed. It is unfortunate that there may now be code out there that relies on accidental properties of library implementations. It would be best if the Implementation is not constrained. Unless the standard mandates that, after a string is moved, the string is empty, the user should only be able to assume that it is in some consistent but unspecified state. Otherwise we pay a performance penalty forever. If the standard explicitly states that the argument to the move constructor is defined to be empty after the call, we're stuck.
Re: RFC: Allow moved-from strings to be non-empty
On Fri, 26 Oct 2018, Ville Voutilainen wrote: On Fri, 26 Oct 2018 at 12:55, Jonathan Wakely wrote: Why not? There are already several, and it helps find bugs. Maybe you could convince libc++ to change as well if you want to keep the behavior the same? What bugs? Assuming the string is empty after a move and appending to it without calling clear() first. I find it bloody unfortunate that the standard considers that a bug. It seems quite convenient to me that it's possible to have a bag of strings, move some of them out to be processed, and be able to tell the processed ones from the unprocessed ones without having to separately clear them when moving. We all seem to want different things from move: 1) copy, possibly with a speed up because I don't care what the source looks like afterwards. That's what was standardized, and for a small string that means that moving it should just copy. 2) move and clear 3) move and destroy (most likely many others) The convenience does not seem that great to me, especially if it has a performance cost. It does slightly concern me that some users might actually semantically expect a moved-from string to be empty, even though that's not guaranteed, although for non-SSO cases it *is* guaranteed. Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 5, short string so there is no allocation). Sigh. Apparently it isn't, because the standard doesn't bother placing complexity requirements on string constructors. Writing 5 bytes into the SSO buffer would be constant complexity :-P Indeed it would, but the standard doesn't seem to have complexity requirements on string constructors at all. If it did, moving a non-sso string would *have to* steal from the source, and the sso case would not have to do that, but could. I am not sure what you are saying exactly, but I think "noexcept" is (kind of) playing the role of a complexity requirement here. -- Marc Glisse
Re: RFC: Allow moved-from strings to be non-empty
On Fri, 26 Oct 2018 at 12:55, Jonathan Wakely wrote: > >> Why not? There are already several, and it helps find bugs. Maybe you > >> could convince libc++ to change as well if you want to keep the behavior > >> the same? > > > >What bugs? > > Assuming the string is empty after a move and appending to it without > calling clear() first. I find it bloody unfortunate that the standard considers that a bug. It seems quite convenient to me that it's possible to have a bag of strings, move some of them out to be processed, and be able to tell the processed ones from the unprocessed ones without having to separately clear them when moving. > >> > It does slightly concern me that some users might > >> > actually semantically expect a moved-from string to be empty, even > >> > though that's not guaranteed, although for non-SSO cases > >> > it *is* guaranteed. > >> > >> Is it? In debug mode, I'd be tempted to leave the string as "moved" (size > >> 5, short string so there is no allocation). > > > >Sigh. Apparently it isn't, because the standard doesn't bother placing > >complexity > >requirements on string constructors. > > Writing 5 bytes into the SSO buffer would be constant complexity :-P Indeed it would, but the standard doesn't seem to have complexity requirements on string constructors at all. If it did, moving a non-sso string would *have to* steal from the source, and the sso case would not have to do that, but could. > >Even so, I'd prefer string acting > >like vector, > >so that it will leave the source of a move in an empty state, rather > >than an unspecified > >state. > > It's not guaranteed for vector either. An allocator with POCMA==false > doesn't propagate on move assignment and if the source object's > allocator isn't equal to the target's it will copy, and there's no > guarantee the source will be empty. Right, I was talking about homogeneous vectors; it is well-known that non-propagating allocators don't move. Except when they do (on move construction).
Re: RFC: Allow moved-from strings to be non-empty
On 26/10/18 08:25 +0200, Marc Glisse wrote: On Fri, 26 Oct 2018, Jonathan Wakely wrote: For the libc++ string zeroing the length of a small string happens to be faster. Ah, yes, of course. Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 5, short string so there is no allocation). That's not a bad idea. Although we can't do it for std::wstring and std::u32string, as their small string buffer is *very* small. "N/A"? The proposition was only semi-serious. By the way, I was surprised by the formula for the size of the buffer. It often has size 16, but for a _CharT of size 3 and alignment 1 (unlikely I guess), it has size 18. Oops, that's not intentional.
Re: RFC: Allow moved-from strings to be non-empty
On 26/10/18 12:16 +0300, Ville Voutilainen wrote: On Fri, 26 Oct 2018 at 01:42, Marc Glisse wrote: On Fri, 26 Oct 2018, Ville Voutilainen wrote: > I would rather not introduce a behavioral difference between us and > libc++. Why not? There are already several, and it helps find bugs. Maybe you could convince libc++ to change as well if you want to keep the behavior the same? What bugs? Assuming the string is empty after a move and appending to it without calling clear() first. > It does slightly concern me that some users might > actually semantically expect a moved-from string to be empty, even > though that's not guaranteed, although for non-SSO cases > it *is* guaranteed. Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 5, short string so there is no allocation). Sigh. Apparently it isn't, because the standard doesn't bother placing complexity requirements on string constructors. Writing 5 bytes into the SSO buffer would be constant complexity :-P Even so, I'd prefer string acting like vector, so that it will leave the source of a move in an empty state, rather than an unspecified state. It's not guaranteed for vector either. An allocator with POCMA==false doesn't propagate on move assignment and if the source object's allocator isn't equal to the target's it will copy, and there's no guarantee the source will be empty. This would be a conforming change to our vector: --- a/libstdc++-v3/include/bits/stl_vector.h +++ b/libstdc++-v3/include/bits/stl_vector.h @@ -1793,7 +1793,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER // so we need to individually move each element. this->assign(std::__make_move_if_noexcept_iterator(__x.begin()), std::__make_move_if_noexcept_iterator(__x.end())); - __x.clear(); } } #endif That might even have a bigger performance benefit, because clearing a vector runs destructors, it doesn't just set the length and write a null terminator. If you're using a vector as a buffer of objects and the first thing you do after v1=std::move(v2) is v2.resize(n) then it's a pessimization to have cleared it. Despite the standard not requiring that, it's more useful programmatically to have the empty state than the unspecified state, especially when the state is empty in some cases anyway.
Re: RFC: Allow moved-from strings to be non-empty
On Fri, 26 Oct 2018 at 01:42, Marc Glisse wrote: > > On Fri, 26 Oct 2018, Ville Voutilainen wrote: > > > I would rather not introduce a behavioral difference between us and > > libc++. > > Why not? There are already several, and it helps find bugs. Maybe you > could convince libc++ to change as well if you want to keep the behavior > the same? What bugs? > > It does slightly concern me that some users might > > actually semantically expect a moved-from string to be empty, even > > though that's not guaranteed, although for non-SSO cases > > it *is* guaranteed. > > Is it? In debug mode, I'd be tempted to leave the string as "moved" (size > 5, short string so there is no allocation). Sigh. Apparently it isn't, because the standard doesn't bother placing complexity requirements on string constructors. Even so, I'd prefer string acting like vector, so that it will leave the source of a move in an empty state, rather than an unspecified state. Despite the standard not requiring that, it's more useful programmatically to have the empty state than the unspecified state, especially when the state is empty in some cases anyway.
Re: RFC: Allow moved-from strings to be non-empty
On Fri, 26 Oct 2018, Jonathan Wakely wrote: For the libc++ string zeroing the length of a small string happens to be faster. Ah, yes, of course. Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 5, short string so there is no allocation). That's not a bad idea. Although we can't do it for std::wstring and std::u32string, as their small string buffer is *very* small. "N/A"? The proposition was only semi-serious. By the way, I was surprised by the formula for the size of the buffer. It often has size 16, but for a _CharT of size 3 and alignment 1 (unlikely I guess), it has size 18. -- Marc Glisse
Re: RFC: Allow moved-from strings to be non-empty
On 26/10/18 00:17 +0100, Jonathan Wakely wrote: On 26/10/18 00:42 +0200, Marc Glisse wrote: On Fri, 26 Oct 2018, Ville Voutilainen wrote: I would rather not introduce a behavioral difference between us and libc++. Why not? There are already several, and it helps find bugs. Maybe you could convince libc++ to change as well if you want to keep the behavior the same? For the libc++ string zeroing the length of a small string happens to be faster. See Howard's answer at the stackoverflow link I gave. He says: 'So in summary, the setting of the source to an empty string is necessary in "long mode" to transfer ownership of the pointer, and also necessary in short mode for performance reasons to avoid a broken pipeline.' That's not the case for our SSO string, which wasn't designed from the ground up to make this move constructor optimal. We can choose whether to leave moved-from strings empty or not. I'm not persuaded that preserving portable behaviour between implementations is useful here. This is unspecified behaviour. I agree with Marc that introducing a difference helps find bugs, and teaches people not to rely on unspecified properties. I haven't been able to measure any performance difference, and in many cases the writes to the rvalue will be dead stores if the object is about to be destroyed anyway. But the code does look slightly better (caveat: I don't know what I'm talking about). Before: leaq16(%rdi), %rdx movq%rdi, %rax movq%rdx, (%rdi) movq(%rsi), %rcx leaq16(%rsi), %rdx cmpq%rdx, %rcx je .L5 movq%rcx, (%rdi) movq16(%rsi), %rcx movq%rcx, 16(%rdi) .L3: movq8(%rsi), %rcx movq%rdx, (%rsi) movq$0, 8(%rsi) movq%rcx, 8(%rax) movb$0, 16(%rsi) ret .L5: movdqu 16(%rsi), %xmm0 movups %xmm0, 16(%rdi) jmp .L3 After: leaq16(%rdi), %rdx movq%rdi, %rax movq%rdx, (%rdi) movq8(%rsi), %rdx movq(%rsi), %rcx movq%rdx, 8(%rdi) leaq16(%rsi), %rdx cmpq%rdx, %rcx je .L5 movq%rcx, (%rdi) movq16(%rsi), %rcx movq%rdx, (%rsi) movq%rcx, 16(%rdi) movq$0, 8(%rsi) movb$0, 16(%rsi) ret .L5: movdqu 16(%rsi), %xmm0 movups %xmm0, 16(%rdi) ret It does slightly concern me that some users might actually semantically expect a moved-from string to be empty, even though that's not guaranteed, although for non-SSO cases it *is* guaranteed. Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 5, short string so there is no allocation). That's not a bad idea. Although we can't do it for std::wstring and std::u32string, as their small string buffer is *very* small.
Re: RFC: Allow moved-from strings to be non-empty
On 26/10/18 00:42 +0200, Marc Glisse wrote: On Fri, 26 Oct 2018, Ville Voutilainen wrote: I would rather not introduce a behavioral difference between us and libc++. Why not? There are already several, and it helps find bugs. Maybe you could convince libc++ to change as well if you want to keep the behavior the same? For the libc++ string zeroing the length of a small string happens to be faster. See Howard's answer at the stackoverflow link I gave. He says: 'So in summary, the setting of the source to an empty string is necessary in "long mode" to transfer ownership of the pointer, and also necessary in short mode for performance reasons to avoid a broken pipeline.' That's not the case for our SSO string, which wasn't designed from the ground up to make this move constructor optimal. We can choose whether to leave moved-from strings empty or not. I'm not persuaded that preserving portable behaviour between implementations is useful here. This is unspecified behaviour. I agree with Marc that introducing a difference helps find bugs, and teaches people not to rely on unspecified properties. I haven't been able to measure any performance difference, and in many cases the writes to the rvalue will be dead stores if the object is about to be destroyed anyway. But the code does look slightly better (caveat: I don't know what I'm talking about). Before: leaq16(%rdi), %rdx movq%rdi, %rax movq%rdx, (%rdi) movq(%rsi), %rcx leaq16(%rsi), %rdx cmpq%rdx, %rcx je .L5 movq%rcx, (%rdi) movq16(%rsi), %rcx movq%rcx, 16(%rdi) .L3: movq8(%rsi), %rcx movq%rdx, (%rsi) movq$0, 8(%rsi) movq%rcx, 8(%rax) movb$0, 16(%rsi) ret .L5: movdqu 16(%rsi), %xmm0 movups %xmm0, 16(%rdi) jmp .L3 After: leaq16(%rdi), %rdx movq%rdi, %rax movq%rdx, (%rdi) movq8(%rsi), %rdx movq(%rsi), %rcx movq%rdx, 8(%rdi) leaq16(%rsi), %rdx cmpq%rdx, %rcx je .L5 movq%rcx, (%rdi) movq16(%rsi), %rcx movq%rdx, (%rsi) movq%rcx, 16(%rdi) movq$0, 8(%rsi) movb$0, 16(%rsi) ret .L5: movdqu 16(%rsi), %xmm0 movups %xmm0, 16(%rdi) ret It does slightly concern me that some users might actually semantically expect a moved-from string to be empty, even though that's not guaranteed, although for non-SSO cases it *is* guaranteed. Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 5, short string so there is no allocation). That's not a bad idea.
Re: RFC: Allow moved-from strings to be non-empty
On Fri, 26 Oct 2018, Ville Voutilainen wrote: I would rather not introduce a behavioral difference between us and libc++. Why not? There are already several, and it helps find bugs. Maybe you could convince libc++ to change as well if you want to keep the behavior the same? It does slightly concern me that some users might actually semantically expect a moved-from string to be empty, even though that's not guaranteed, although for non-SSO cases it *is* guaranteed. Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 5, short string so there is no allocation). -- Marc Glisse
Re: RFC: Allow moved-from strings to be non-empty
On Thu, 25 Oct 2018, Jonathan Wakely wrote: When an SSO string is contained in the small string buffer or has an unequal allocator a move operation performs a copy, leaving the original data in the moved-from string. Setting the length of the moved-from string to zero is not required, so we can avoid two writes (to the length member and to set the first character to nul) by leaving the moved-from string unchanged. This might be surprising to some users, who (wrongly) expect a string to always be empty after a move. Is that acceptable? It's worth noting that the current behaviour, where we do more work than required, also surprises some people, e.g. https://stackoverflow.com/q/52696413/981959 Some tests need to be adjusted due to the new behaviour, but they have comments saying they're testing specific implementation details, so I don't think needing to adjust them is an argument against making the change: // NOTE: This makes use of the fact that we know how moveable // is implemented on string (via swap). If the implementation changed // this test may begin to fail. Should we make this change? I didn't read the patch, but from the description above, I say yes. Interesting that I got my relocate patch in before, because after your patch relocate probably helps a little less than before. -- Marc Glisse
Re: RFC: Allow moved-from strings to be non-empty
On Fri, 26 Oct 2018 at 00:54, Jonathan Wakely wrote: > > When an SSO string is contained in the small string buffer or has an > unequal allocator a move operation performs a copy, leaving the original > data in the moved-from string. Setting the length of the moved-from > string to zero is not required, so we can avoid two writes (to the > length member and to set the first character to nul) by leaving the > moved-from string unchanged. > > This might be surprising to some users, who (wrongly) expect a string > to always be empty after a move. Is that acceptable? > > It's worth noting that the current behaviour, where we do more work > than required, also surprises some people, e.g. > https://stackoverflow.com/q/52696413/981959 > > Some tests need to be adjusted due to the new behaviour, but they have > comments saying they're testing specific implementation details, so I > don't think needing to adjust them is an argument against making the > change: > > // NOTE: This makes use of the fact that we know how moveable > // is implemented on string (via swap). If the implementation changed > // this test may begin to fail. > > > Should we make this change? I would rather not introduce a behavioral difference between us and libc++. It does slightly concern me that some users might actually semantically expect a moved-from string to be empty, even though that's not guaranteed, although for non-SSO cases it *is* guaranteed.