2016-12-20 9:11 GMT+01:00 Andres Freund <and...@anarazel.de>:

> On 2016-12-19 15:25:42 -0500, Robert Haas wrote:
> > On Mon, Dec 19, 2016 at 3:13 PM, Peter Eisentraut
> > <peter.eisentr...@2ndquadrant.com> wrote:
> > > On 12/9/16 7:52 AM, Robert Haas wrote:
> > >> It's kind of ironic, at least IMHO, that the DirectionFunctionCall
> > >> macros are anything but direct.  Each one is a non-inlined function
> > >> call that does a minimum of 8 variable assignments before actually
> > >> calling the function.
> > >
> > > If this is a problem (it might be), then we can just make those calls,
> > > er, direct C function calls to different function.  For example,
> > >
> > >     result = DatumGetObjectId(DirectFunctionCall1(oidin,
> > >                               CStringGetDatum(pro_name_or_oid)));
> > >
> > > could just be
> > >
> > >     result = oidin_internal(pro_name_or_oid);
> >
> > Yeah, true -- that works for simple cases, and might be beneficial
> > where you're doing lots and lots of calls in a tight loop.
> >
> > For the more general problem, I wonder if we should try to figure out
> > something where we have one calling convention for "simple" functions
> > (those that little or none of the information in fcinfo and can pretty
> > much just take their SQL args as C args) and another for "complex"
> > functions (where we do the full push-ups).
>
> Unfortunately I think so would likely also increase overhead in a bunch
> of places, because suddenly we need branches determining which callling
> convention is in use. There's a fair amount of places, some of them
> performance sensitive, that deal with functions that can get different
> number of arguments. Prominently nodeAgg.c's transition functions for
> example.
>
> When JITing expressions that doesn't matter, because we avoid doing so
> repetitively. But that's not really sufficient imo, non JITed
> performance matters a lot too.
>
> There's imo primarily two parts that make our calling convention
> expensive:
> a) Additional instructions required to push to/from fcinfo->arg[null],
>    including stack spilling required to have space for the arguments.
> b) Increased number of cachelines touched, even for even trivial
>    functions. We need to read/write to at least five:
>    1) fcinfo->isnull to reset it
>    2) fcinfo->arg[0...] to set the argument
>    3) fcinfo->argnull[0...] to set argnull (separate cacheline)
>    4) fcinfo->flinfo->fn_addr to get the actual address to call
>    5) instruction cache miss for the function's contents
>
> We can doctor around this some. A retively easy way to reduce the
> overhead of a similar function call interface would be to restructure
> things so we have something like:
>
> typedef struct FunctionArg2
> {
>         Datum arg;
>         bool argnull;
> } FunctionArg2;
>
> typedef struct FunctionCallInfoData2
> {
>         /* cached function address from ->flinfo */
>         PGFunction      fn_addr;
>         /* ptr to lookup info used for this call */
>         FmgrInfo       *flinfo;
>         /* function must set true if result is NULL */
>         bool            isnull;
>         /* # arguments actually passed */
>         short           nargs;
>         /* collation for function to use */
>         Oid             fncollation;    /* collation for function to use */
>         /* pass or return extra info about result */
>         fmNodePtr       resultinfo;
>         /* array big enough to fit flinfo->fn_nargs */
>         FunctionArg2    args[FLEXIBLE_ARRAY_MEMBER];
> } FunctionCallInfoData2;
>
> That'd mean some code changes because accessing arguments can't be done
> quite as now, but it'd be fairly minor.  We'd end up with a lot fewer
> cache misses for common cases, because there's no need to fetch fn_addr,
> and because the first two arg/argnull pairs still fit within the first
> cacheline (which they never can in the current interface!).
>
> But we'd still be passing arguments via memory, instead of using
> registers.
>
> I think a more efficient variant would make the function signature look
> something like:
>
> Datum /* directly returned argument */
> pgfunc(
>         /* extra information about function call */
>         FunctionCallInfo *fcinfo,
>         /* bitmap of NULL arguments */
>         uint64_t nulls,
>         /* first argument */
>         Datum arg0,
>         /* second argument */
>         Datum arg1,
>         /* returned NULL */
>         bool *isnull
> );
>
> with additional arguments passed via fcinfo as currently done. Since
> most of the performance critical function calls only have two arguments
> that should allow to keep usage of fcinfo-> to the cases where we need
> the extra information.  On 64bit linuxes the above will be entirely in
> registers, on 64bit windows isnull would be placed on the stack.
>
> I don't quite see how we could avoid stack usage at all for windows, any
> of the above arguments seems important.
>
> There'd be a need for some trickery to make PG_GETARG_DATUM() work
> efficiently - afaics the argument numbers passed to PG_GETARG_*() are
> always constants, so that shouldn't be too bad.
>

In this case some benchmark can be very important (and interesting). I am
not sure if faster function execution has significant benefit on Vulcano
like executor.

Regards

Pavel


>
> Regards,
>
> Andres
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Reply via email to