Re: [proto] Holding children by copy or reference

2013-10-14 Thread Luc Danton

On 2013-09-30 13:54, Mathias Gaunard wrote:

Hi,

A while ago, I recommended to set up domains so that Proto contains its
children by value, except for terminals that should either be references
or values depending on the lvalue-ness. This allows to avoid dangling
reference problems when storing expressions or using 'auto'.
I also said there was no overhead to doing this in the case of Boost.SIMD.

After having done more analyses with more complex code, it appears that
there is indeed an overhead to doing this: it confuses the alias
analysis of the compiler which becomes unable to perform some
optimizations that it would otherwise normally perform.

For example, an expression like this:
r = a*b + a*b;

will not anymore get optimized to
tmp = a*b;
r = tmp + tmp;

If terminals are held by reference, the compiler can also emit extra
loads, which it doesn't do if the the terminal is held by value or if
all children are held by reference.

This is a bit surprising that this affects compiler optimizations like
this, but this is replicable on both Clang and GCC, with all versions I
have access to.

Therefore, to avoid performance issues, I'm considering moving to always
using references (with the default domain behaviour), and relying on
BOOST_FORCEINLINE to make it work as expected.
Of course this has the caveat that if the force inline is disabled (or
doesn't work), then you'll get segmentation faults.


Hello,

as a heads-up, I've made it a habit in C++11 to structure generic 
'holders' or types as such:


templatetypename Some, typename Parameters, typename Here
struct foo_type {
// Encapsulation omitted for brevity

foo_type(Some some, Parameters parameters, Here here)
// Don't use std::move here
: some(std::forwardSome(some))
, parameters(std::forwardParameters(parameters))
, here(std::forwardHere(here))
{}

Some some;
Parameters parameters;
Here here;

/* How to use the data members: */

/* example observer */
Some peek() { return some; }
/* can be cv-qualified */
Some const peek() const { return some; }
/* can be ref-qualified */
Parameters fetch() 
{ return std::forwardParameters(parameters); }

/* meant to be called several times per lifetimes
decltype(auto) bar()
  /* can be cv- and ref-qualified indifferently */
{
// don't forward, don't move
return qux(some, parameters, here);
}

/* meant to be called at most once per lifetime */
void zap()
{
// forwarding is a low-hanging optimization
blast(std::forwardSome(some));
}
};

templatetypename Some, typename Parameters, typename Here
foo_typeSome, Parameters, Here foo(Some some
 , Parameters parameters
 , Here here)
{
return {
std::forwardSome(some)
, std::forwardParameters(parameters)
, std::forwardHere(here)
 };
}

Note that either auto f = foo(0, 'a', c); or auto f = foo(0, 'a', 
c); is fine, with no dangling reference. Rvalues arguments to the foo 
factory are stored as values, lvalue arguments as lvalue references. You 
can still ask for rvalue reference members 'by hand' (e.g. 
foo_typeint, int, int f { std::move(i), std::move(i), 
std::move(i) };, although I don't really use that functionality (save 
with std::tuple, but that's another story).


For something like auto f = foo(1, 2, 3); auto g = foo(f, 4, 5); then
inside g the ints would be held as values, and f as a reference. If 
std::move(f) would have been used, it would have been moved inside a 
copy internal to g. In terms of an EDSL, then both nodes and terminals 
can be held indifferently as references or values, depending on how they 
are passed as arguments.


As I've said, I use this technique as a default and I do have a 
run-off-the-mill lazy-eval EDSL where I put it to use. I cannot report 
bad things happening (incl. with libstdc++ debug mode, and checking with 
Valgrind). IME, when looking at the generated code, the compiler can see 
through most of the time. I have to warn though that I do not use the 
technique for the sake of efficiency. I simply find it the most 
convenient and elegant.


___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] Holding children by copy or reference

2013-10-01 Thread Bart Janssens
On Tue, Oct 1, 2013 at 12:59 AM, Mathias Gaunard
mathias.gaun...@ens-lyon.org wrote:
 To clarify, in terms of performance, from best-to-worst:
 1) everything by reference: no problem with performance (but problematic
 dangling references in some scenarios)
 2) everything by value: no CSE or other optimizations
 3) nodes by value, terminals by reference: no CSE or other optimizations +
 loads when accessing the terminals

Just out of interest: would holding the a*b temporary node by rvalue
reference be possible and would it be of any help?

Cheers,

Bart
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] Holding children by copy or reference

2013-10-01 Thread Eric Niebler
On 10/1/2013 12:05 AM, Bart Janssens wrote:
 On Tue, Oct 1, 2013 at 12:59 AM, Mathias Gaunard
 mathias.gaun...@ens-lyon.org wrote:
 To clarify, in terms of performance, from best-to-worst:
 1) everything by reference: no problem with performance (but problematic
 dangling references in some scenarios)
 2) everything by value: no CSE or other optimizations
 3) nodes by value, terminals by reference: no CSE or other optimizations +
 loads when accessing the terminals
 
 Just out of interest: would holding the a*b temporary node by rvalue
 reference be possible and would it be of any help?

Possible in theory, yes. In practice, it probably doesn't work since
proto-v4 is not C++11 aware. But even if it worked, it wouldn't solve
anything. Rvalue refs have the same lifetime issues that (const) lvalue
refs have. The temporary object to which they refer will not outlive the
full expression.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


[proto] Holding children by copy or reference

2013-09-30 Thread Mathias Gaunard

Hi,

A while ago, I recommended to set up domains so that Proto contains its 
children by value, except for terminals that should either be references 
or values depending on the lvalue-ness. This allows to avoid dangling 
reference problems when storing expressions or using 'auto'.

I also said there was no overhead to doing this in the case of Boost.SIMD.

After having done more analyses with more complex code, it appears that 
there is indeed an overhead to doing this: it confuses the alias 
analysis of the compiler which becomes unable to perform some 
optimizations that it would otherwise normally perform.


For example, an expression like this:
r = a*b + a*b;

will not anymore get optimized to
tmp = a*b;
r = tmp + tmp;

If terminals are held by reference, the compiler can also emit extra 
loads, which it doesn't do if the the terminal is held by value or if 
all children are held by reference.


This is a bit surprising that this affects compiler optimizations like 
this, but this is replicable on both Clang and GCC, with all versions I 
have access to.


Therefore, to avoid performance issues, I'm considering moving to always 
using references (with the default domain behaviour), and relying on 
BOOST_FORCEINLINE to make it work as expected.
Of course this has the caveat that if the force inline is disabled (or 
doesn't work), then you'll get segmentation faults.

___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] Holding children by copy or reference

2013-09-30 Thread Eric Niebler
On 9/30/2013 1:54 PM, Mathias Gaunard wrote:
 Hi,
 
 A while ago, I recommended to set up domains so that Proto contains its
 children by value, except for terminals that should either be references
 or values depending on the lvalue-ness. This allows to avoid dangling
 reference problems when storing expressions or using 'auto'.
 I also said there was no overhead to doing this in the case of Boost.SIMD.
 
 After having done more analyses with more complex code, it appears that
 there is indeed an overhead to doing this: it confuses the alias
 analysis of the compiler which becomes unable to perform some
 optimizations that it would otherwise normally perform.
 
 For example, an expression like this:
 r = a*b + a*b;
 
 will not anymore get optimized to
 tmp = a*b;
 r = tmp + tmp;

Interesting!

 If terminals are held by reference, the compiler can also emit extra
 loads, which it doesn't do if the the terminal is held by value or if
 all children are held by reference.
 
 This is a bit surprising that this affects compiler optimizations like
 this, but this is replicable on both Clang and GCC, with all versions I
 have access to.

It's very surprising. I suppose it's because the compiler can't assume
equasional reasoning holds for some user-defined type. That's too bad.

 Therefore, to avoid performance issues, I'm considering moving to always
 using references (with the default domain behaviour), and relying on
 BOOST_FORCEINLINE to make it work as expected.

Why is FORCEINLINE needed?

 Of course this has the caveat that if the force inline is disabled (or
 doesn't work), then you'll get segmentation faults.

I don't understand why that should make a difference. Can you clarify? A
million thanks for doing the analysis and reporting the results, by the way.

As an aside, in Proto v5, terminals and intermediate nodes are captured
as you describe by default, which means perf problems. I still think
this is the right default for C++11, and for most EDSLs. I'll have to be
explicit in the docs about the performance implications, and make it
easy for people to get the by-ref capture behavior when they're ok with
the risks.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] Holding children by copy or reference

2013-09-30 Thread Mathias Gaunard

On 30/09/13 08:01, Eric Niebler wrote:


Therefore, to avoid performance issues, I'm considering moving to always
using references (with the default domain behaviour), and relying on
BOOST_FORCEINLINE to make it work as expected.


Why is FORCEINLINE needed?


The scenario is

terminal a, b, c, r;

auto tmp = a*b*c;
r = tmp + tmp;

Assuming everything is held by reference, when used in r, tmp will refer 
to a dangling reference (the a*b node).


If everything is inlined, the problem may be avoided because it doesn't 
require things to be present on the stack.


Of course, it's quite hacky.

___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] Holding children by copy or reference

2013-09-30 Thread Eric Niebler
On 9/30/2013 11:08 AM, Mathias Gaunard wrote:
 On 30/09/13 08:01, Eric Niebler wrote:
 
 Therefore, to avoid performance issues, I'm considering moving to always
 using references (with the default domain behaviour), and relying on
 BOOST_FORCEINLINE to make it work as expected.

 Why is FORCEINLINE needed?
 
 The scenario is
 
 terminal a, b, c, r;
 
 auto tmp = a*b*c;
 r = tmp + tmp;
 
 Assuming everything is held by reference, when used in r, tmp will refer
 to a dangling reference (the a*b node).
 
 If everything is inlined, the problem may be avoided because it doesn't
 require things to be present on the stack.

Yikes! You don't need me to tell you that's UB, and you really shouldn't
encourage people to do that.

You can independently control how intermediate nodes are captured, as
opposed to how terminals are captured. In this case, you want a,b,c held
by reference, and the temporary a*b to be held by value. Have you
tried this, and still found it to be slow?

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto