Re: RFC: Allow moved-from strings to be non-empty

2018-10-26 Thread Ville Voutilainen
On Sat, 27 Oct 2018 at 01:27, Joe Buck  wrote:
>
> The reason move constructors were introduced was to speed up code in cases 
> where an object
> Is copied and the copy is no longer needed.  It is unfortunate that there may 
> now be code out
> there that relies on accidental properties of library implementations.  It 
> would be best if the
> Implementation is not constrained.  Unless the standard mandates that, after 
> a string is moved,
> the string is empty, the user should only be able to assume that it is in 
> some consistent but
> unspecified state.  Otherwise we pay a performance penalty forever.
>
> If the standard explicitly states that the argument to the move constructor 
> is defined to be
> empty after the call, we're stuck.

It certainly doesn't specify so for an SSO string, so we're not stuck.
On the other hand, we already get
a speed-up, it's just not as much of a speed-up as it can be. What I
really loathe is the potential implementation
divergence; it's all good for the priesthood to refer to the standard
and say "you shouldn't have done that", but
that's, as a good friend of mine provided as a phrasing on a different
manner, spectacularly unhelpful.


RE: RFC: Allow moved-from strings to be non-empty

2018-10-26 Thread Joe Buck
The reason move constructors were introduced was to speed up code in cases 
where an object
Is copied and the copy is no longer needed.  It is unfortunate that there may 
now be code out
there that relies on accidental properties of library implementations.  It 
would be best if the
Implementation is not constrained.  Unless the standard mandates that, after a 
string is moved,
the string is empty, the user should only be able to assume that it is in some 
consistent but
unspecified state.  Otherwise we pay a performance penalty forever. 

If the standard explicitly states that the argument to the move constructor is 
defined to be
empty after the call, we're stuck.



Re: RFC: Allow moved-from strings to be non-empty

2018-10-26 Thread Marc Glisse

On Fri, 26 Oct 2018, Ville Voutilainen wrote:


On Fri, 26 Oct 2018 at 12:55, Jonathan Wakely  wrote:

Why not? There are already several, and it helps find bugs. Maybe you
could convince libc++ to change as well if you want to keep the behavior
the same?


What bugs?


Assuming the string is empty after a move and appending to it without
calling clear() first.


I find it bloody unfortunate that the standard considers that a bug. It 
seems quite convenient to me that it's possible to have a bag of 
strings, move some of them out to be processed, and be able to tell the 
processed ones from the unprocessed ones without having to separately 
clear them when moving.


We all seem to want different things from move:

1) copy, possibly with a speed up because I don't care what the source 
looks like afterwards. That's what was standardized, and for a small 
string that means that moving it should just copy.


2) move and clear

3) move and destroy

(most likely many others)

The convenience does not seem that great to me, especially if it has a 
performance cost.



It does slightly concern me that some users might
actually semantically expect a moved-from string to be empty, even
though that's not guaranteed, although for non-SSO cases
it *is* guaranteed.


Is it? In debug mode, I'd be tempted to leave the string as "moved" (size
5, short string so there is no allocation).


Sigh. Apparently it isn't, because the standard doesn't bother placing
complexity
requirements on string constructors.


Writing 5 bytes into the SSO buffer would be constant complexity :-P


Indeed it would, but the standard doesn't seem to have complexity 
requirements on string constructors at all. If it did, moving a non-sso 
string would *have to* steal from the source, and the sso case would not 
have to do that, but could.


I am not sure what you are saying exactly, but I think "noexcept" is (kind 
of) playing the role of a complexity requirement here.


--
Marc Glisse


Re: RFC: Allow moved-from strings to be non-empty

2018-10-26 Thread Ville Voutilainen
On Fri, 26 Oct 2018 at 12:55, Jonathan Wakely  wrote:
> >> Why not? There are already several, and it helps find bugs. Maybe you
> >> could convince libc++ to change as well if you want to keep the behavior
> >> the same?
> >
> >What bugs?
>
> Assuming the string is empty after a move and appending to it without
> calling clear() first.

I find it bloody unfortunate that the standard considers that a bug.
It seems quite convenient
to me that it's possible to have a bag of strings, move some of them
out to be processed,
and be able to tell the processed ones from the unprocessed ones
without having to separately
clear them when moving.

> >> > It does slightly concern me that some users might
> >> > actually semantically expect a moved-from string to be empty, even
> >> > though that's not guaranteed, although for non-SSO cases
> >> > it *is* guaranteed.
> >>
> >> Is it? In debug mode, I'd be tempted to leave the string as "moved" (size
> >> 5, short string so there is no allocation).
> >
> >Sigh. Apparently it isn't, because the standard doesn't bother placing
> >complexity
> >requirements on string constructors.
>
> Writing 5 bytes into the SSO buffer would be constant complexity :-P

Indeed it would, but the standard doesn't seem to have complexity
requirements on string constructors
at all. If it did, moving a non-sso string would *have to* steal from
the source, and the sso case
would not have to do that, but could.

> >Even so, I'd prefer string acting
> >like vector,
> >so that it will leave the source of a move in an empty state, rather
> >than an unspecified
> >state.
>
> It's not guaranteed for vector either. An allocator with POCMA==false
> doesn't propagate on move assignment and if the source object's
> allocator isn't equal to the target's it will copy, and there's no
> guarantee the source will be empty.

Right, I was talking about homogeneous vectors; it is well-known that
non-propagating allocators
don't move. Except when they do (on move construction).


Re: RFC: Allow moved-from strings to be non-empty

2018-10-26 Thread Jonathan Wakely

On 26/10/18 08:25 +0200, Marc Glisse wrote:

On Fri, 26 Oct 2018, Jonathan Wakely wrote:


For the libc++ string zeroing the length of a small string happens to
be faster.


Ah, yes, of course.

Is it? In debug mode, I'd be tempted to leave the string as 
"moved" (size 5, short string so there is no allocation).


That's not a bad idea.


Although we can't do it for std::wstring and std::u32string, as their
small string buffer is *very* small.


"N/A"? The proposition was only semi-serious.

By the way, I was surprised by the formula for the size of the buffer. 
It often has size 16, but for a _CharT of size 3 and alignment 1 
(unlikely I guess), it has size 18.


Oops, that's not intentional.



Re: RFC: Allow moved-from strings to be non-empty

2018-10-26 Thread Jonathan Wakely

On 26/10/18 12:16 +0300, Ville Voutilainen wrote:

On Fri, 26 Oct 2018 at 01:42, Marc Glisse  wrote:


On Fri, 26 Oct 2018, Ville Voutilainen wrote:

> I would rather not introduce a behavioral difference between us and
> libc++.

Why not? There are already several, and it helps find bugs. Maybe you
could convince libc++ to change as well if you want to keep the behavior
the same?


What bugs?


Assuming the string is empty after a move and appending to it without
calling clear() first.



> It does slightly concern me that some users might
> actually semantically expect a moved-from string to be empty, even
> though that's not guaranteed, although for non-SSO cases
> it *is* guaranteed.

Is it? In debug mode, I'd be tempted to leave the string as "moved" (size
5, short string so there is no allocation).


Sigh. Apparently it isn't, because the standard doesn't bother placing
complexity
requirements on string constructors.


Writing 5 bytes into the SSO buffer would be constant complexity :-P


Even so, I'd prefer string acting
like vector,
so that it will leave the source of a move in an empty state, rather
than an unspecified
state.


It's not guaranteed for vector either. An allocator with POCMA==false
doesn't propagate on move assignment and if the source object's
allocator isn't equal to the target's it will copy, and there's no
guarantee the source will be empty.

This would be a conforming change to our vector:

--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1793,7 +1793,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   // so we need to individually move each element.
   this->assign(std::__make_move_if_noexcept_iterator(__x.begin()),
std::__make_move_if_noexcept_iterator(__x.end()));
-   __x.clear();
 }
  }
#endif

That might even have a bigger performance benefit, because clearing a
vector runs destructors, it doesn't just set the length and write a
null terminator.

If you're using a vector as a buffer of objects and the first thing
you do after v1=std::move(v2) is v2.resize(n) then it's a
pessimization to have cleared it. 


Despite the standard not requiring that, it's more useful
programmatically
to have the empty state than the unspecified state, especially when the state
is empty in some cases anyway.


Re: RFC: Allow moved-from strings to be non-empty

2018-10-26 Thread Ville Voutilainen
On Fri, 26 Oct 2018 at 01:42, Marc Glisse  wrote:
>
> On Fri, 26 Oct 2018, Ville Voutilainen wrote:
>
> > I would rather not introduce a behavioral difference between us and
> > libc++.
>
> Why not? There are already several, and it helps find bugs. Maybe you
> could convince libc++ to change as well if you want to keep the behavior
> the same?

What bugs?

> > It does slightly concern me that some users might
> > actually semantically expect a moved-from string to be empty, even
> > though that's not guaranteed, although for non-SSO cases
> > it *is* guaranteed.
>
> Is it? In debug mode, I'd be tempted to leave the string as "moved" (size
> 5, short string so there is no allocation).

Sigh. Apparently it isn't, because the standard doesn't bother placing
complexity
requirements on string constructors. Even so, I'd prefer string acting
like vector,
so that it will leave the source of a move in an empty state, rather
than an unspecified
state. Despite the standard not requiring that, it's more useful
programmatically
to have the empty state than the unspecified state, especially when the state
is empty in some cases anyway.


Re: RFC: Allow moved-from strings to be non-empty

2018-10-25 Thread Marc Glisse

On Fri, 26 Oct 2018, Jonathan Wakely wrote:


For the libc++ string zeroing the length of a small string happens to
be faster.


Ah, yes, of course.

Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 
5, short string so there is no allocation).


That's not a bad idea.


Although we can't do it for std::wstring and std::u32string, as their
small string buffer is *very* small.


"N/A"? The proposition was only semi-serious.

By the way, I was surprised by the formula for the size of the buffer. It 
often has size 16, but for a _CharT of size 3 and alignment 1 (unlikely I 
guess), it has size 18.


--
Marc Glisse


Re: RFC: Allow moved-from strings to be non-empty

2018-10-25 Thread Jonathan Wakely

On 26/10/18 00:17 +0100, Jonathan Wakely wrote:

On 26/10/18 00:42 +0200, Marc Glisse wrote:

On Fri, 26 Oct 2018, Ville Voutilainen wrote:


I would rather not introduce a behavioral difference between us and
libc++.


Why not? There are already several, and it helps find bugs. Maybe 
you could convince libc++ to change as well if you want to keep the 
behavior the same?


For the libc++ string zeroing the length of a small string happens to
be faster. See Howard's answer at the stackoverflow link I gave. He
says:

'So in summary, the setting of the source to an empty string is necessary in "long 
mode" to transfer ownership of the pointer, and also necessary in short mode for 
performance reasons to avoid a broken pipeline.'

That's not the case for our SSO string, which wasn't designed from the
ground up to make this move constructor optimal. We can choose whether
to leave moved-from strings empty or not.

I'm not persuaded that preserving portable behaviour between
implementations is useful here. This is unspecified behaviour. I agree
with Marc that introducing a difference helps find bugs, and teaches
people not to rely on unspecified properties.

I haven't been able to measure any performance difference, and in many
cases the writes to the rvalue will be dead stores if the object is
about to be destroyed anyway. But the code does look slightly better
(caveat: I don't know what I'm talking about).

Before:

  leaq16(%rdi), %rdx
  movq%rdi, %rax
  movq%rdx, (%rdi)
  movq(%rsi), %rcx
  leaq16(%rsi), %rdx
  cmpq%rdx, %rcx
  je  .L5
  movq%rcx, (%rdi)
  movq16(%rsi), %rcx
  movq%rcx, 16(%rdi)
.L3:
  movq8(%rsi), %rcx
  movq%rdx, (%rsi)
  movq$0, 8(%rsi)
  movq%rcx, 8(%rax)
  movb$0, 16(%rsi)
  ret
.L5:
  movdqu  16(%rsi), %xmm0
  movups  %xmm0, 16(%rdi)
  jmp .L3


After:

  leaq16(%rdi), %rdx
  movq%rdi, %rax
  movq%rdx, (%rdi)
  movq8(%rsi), %rdx
  movq(%rsi), %rcx
  movq%rdx, 8(%rdi)
  leaq16(%rsi), %rdx
  cmpq%rdx, %rcx
  je  .L5
  movq%rcx, (%rdi)
  movq16(%rsi), %rcx
  movq%rdx, (%rsi)
  movq%rcx, 16(%rdi)
  movq$0, 8(%rsi)
  movb$0, 16(%rsi)
  ret
.L5:
  movdqu  16(%rsi), %xmm0
  movups  %xmm0, 16(%rdi)
  ret



It does slightly concern me that some users might
actually semantically expect a moved-from string to be empty, even
though that's not guaranteed, although for non-SSO cases
it *is* guaranteed.


Is it? In debug mode, I'd be tempted to leave the string as "moved" 
(size 5, short string so there is no allocation).


That's not a bad idea.


Although we can't do it for std::wstring and std::u32string, as their
small string buffer is *very* small.



Re: RFC: Allow moved-from strings to be non-empty

2018-10-25 Thread Jonathan Wakely

On 26/10/18 00:42 +0200, Marc Glisse wrote:

On Fri, 26 Oct 2018, Ville Voutilainen wrote:


I would rather not introduce a behavioral difference between us and
libc++.


Why not? There are already several, and it helps find bugs. Maybe you 
could convince libc++ to change as well if you want to keep the 
behavior the same?


For the libc++ string zeroing the length of a small string happens to
be faster. See Howard's answer at the stackoverflow link I gave. He
says:

'So in summary, the setting of the source to an empty string is necessary in "long 
mode" to transfer ownership of the pointer, and also necessary in short mode for 
performance reasons to avoid a broken pipeline.'

That's not the case for our SSO string, which wasn't designed from the
ground up to make this move constructor optimal. We can choose whether
to leave moved-from strings empty or not.

I'm not persuaded that preserving portable behaviour between
implementations is useful here. This is unspecified behaviour. I agree
with Marc that introducing a difference helps find bugs, and teaches
people not to rely on unspecified properties.

I haven't been able to measure any performance difference, and in many
cases the writes to the rvalue will be dead stores if the object is
about to be destroyed anyway. But the code does look slightly better
(caveat: I don't know what I'm talking about).

Before:

   leaq16(%rdi), %rdx
   movq%rdi, %rax
   movq%rdx, (%rdi)
   movq(%rsi), %rcx
   leaq16(%rsi), %rdx
   cmpq%rdx, %rcx
   je  .L5
   movq%rcx, (%rdi)
   movq16(%rsi), %rcx
   movq%rcx, 16(%rdi)
.L3:
   movq8(%rsi), %rcx
   movq%rdx, (%rsi)
   movq$0, 8(%rsi)
   movq%rcx, 8(%rax)
   movb$0, 16(%rsi)
   ret
.L5:
   movdqu  16(%rsi), %xmm0
   movups  %xmm0, 16(%rdi)
   jmp .L3


After:

   leaq16(%rdi), %rdx
   movq%rdi, %rax
   movq%rdx, (%rdi)
   movq8(%rsi), %rdx
   movq(%rsi), %rcx
   movq%rdx, 8(%rdi)
   leaq16(%rsi), %rdx
   cmpq%rdx, %rcx
   je  .L5
   movq%rcx, (%rdi)
   movq16(%rsi), %rcx
   movq%rdx, (%rsi)
   movq%rcx, 16(%rdi)
   movq$0, 8(%rsi)
   movb$0, 16(%rsi)
   ret
.L5:
   movdqu  16(%rsi), %xmm0
   movups  %xmm0, 16(%rdi)
   ret



It does slightly concern me that some users might
actually semantically expect a moved-from string to be empty, even
though that's not guaranteed, although for non-SSO cases
it *is* guaranteed.


Is it? In debug mode, I'd be tempted to leave the string as "moved" 
(size 5, short string so there is no allocation).


That's not a bad idea.



Re: RFC: Allow moved-from strings to be non-empty

2018-10-25 Thread Marc Glisse

On Fri, 26 Oct 2018, Ville Voutilainen wrote:


I would rather not introduce a behavioral difference between us and
libc++.


Why not? There are already several, and it helps find bugs. Maybe you 
could convince libc++ to change as well if you want to keep the behavior 
the same?



It does slightly concern me that some users might
actually semantically expect a moved-from string to be empty, even
though that's not guaranteed, although for non-SSO cases
it *is* guaranteed.


Is it? In debug mode, I'd be tempted to leave the string as "moved" (size 
5, short string so there is no allocation).


--
Marc Glisse


Re: RFC: Allow moved-from strings to be non-empty

2018-10-25 Thread Marc Glisse

On Thu, 25 Oct 2018, Jonathan Wakely wrote:


When an SSO string is contained in the small string buffer or has an
unequal allocator a move operation performs a copy, leaving the original
data in the moved-from string. Setting the length of the moved-from
string to zero is not required, so we can avoid two writes (to the
length member and to set the first character to nul) by leaving the
moved-from string unchanged.

This might be surprising to some users, who (wrongly) expect a string
to always be empty after a move. Is that acceptable?

It's worth noting that the current behaviour, where we do more work
than required, also surprises some people, e.g.
https://stackoverflow.com/q/52696413/981959

Some tests need to be adjusted due to the new behaviour, but they have
comments saying they're testing specific implementation details, so I
don't think needing to adjust them is an argument against making the
change:

// NOTE: This makes use of the fact that we know how moveable
// is implemented on string (via swap). If the implementation changed
// this test may begin to fail.


Should we make this change?


I didn't read the patch, but from the description above, I say yes.

Interesting that I got my relocate patch in before, because after your 
patch relocate probably helps a little less than before.


--
Marc Glisse


Re: RFC: Allow moved-from strings to be non-empty

2018-10-25 Thread Ville Voutilainen
On Fri, 26 Oct 2018 at 00:54, Jonathan Wakely  wrote:
>
> When an SSO string is contained in the small string buffer or has an
> unequal allocator a move operation performs a copy, leaving the original
> data in the moved-from string. Setting the length of the moved-from
> string to zero is not required, so we can avoid two writes (to the
> length member and to set the first character to nul) by leaving the
> moved-from string unchanged.
>
> This might be surprising to some users, who (wrongly) expect a string
> to always be empty after a move. Is that acceptable?
>
> It's worth noting that the current behaviour, where we do more work
> than required, also surprises some people, e.g.
> https://stackoverflow.com/q/52696413/981959
>
> Some tests need to be adjusted due to the new behaviour, but they have
> comments saying they're testing specific implementation details, so I
> don't think needing to adjust them is an argument against making the
> change:
>
> // NOTE: This makes use of the fact that we know how moveable
> // is implemented on string (via swap). If the implementation changed
> // this test may begin to fail.
>
>
> Should we make this change?

I would rather not introduce a behavioral difference between us and
libc++. It does slightly concern me that some users might
actually semantically expect a moved-from string to be empty, even
though that's not guaranteed, although for non-SSO cases
it *is* guaranteed.