Re: [HACKERS] Parallel Sort

Kohei KaiGai Mon, 13 May 2013 12:53:11 -0700

2013/5/13 Noah Misch <n...@leadboat.com>

> * Planner & Similar Issues
>
> We should decide whether to actually sort in parallel based on the
> comparator
> cost and the data size.  The system currently has no information on
> comparator
> cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
> but bttextcmp is at least 1000x slower than btint4cmp.  Let's improve the
> procost estimates of all core B-tree and hash operators.  This will have
> other
> benefits, but we will need to be cognizant of the risk of upsetting setups
> that have tuned cpu_operator_cost based on the present situation.
>
> The choice of whether to parallelize can probably be made a manner similar
> to
> the choice to do an external sort: the planner guesses the outcome for
> costing
> purposes, but the actual decision is made at execution time.  The planner
> would determine a tuple count cutoff at which parallelism becomes
> favorable,
> and tuplesort would check that to establish its actual decision.
>
> It probably crossovers my problem consciousness to off-load CPU bounds
workloads; that I partially tried to implement on writable foreign table
feature.
Not only sorting stuff, I think it may be worthful to have capability to
push
heavy workload (like sort, aggregate or complex target-list) out external
computing resources.
However, I doubt whether the decision to parallelize should be done in
execution time, rather than plan stage. For example, in case when we
have enough number of records and 10-core multiprocessor, the wise
plan may take parallel data load by 10-processors, partial-sort by 10-
processors individually, then merge-sort. It needs fundamental different
tree structure from the traditional single-processors based plan-tree.
So, it seems to me we should take an enhancement to allow to inject
plan-tree special purpose parallel processing plan node.
How about your opinion?


Thanks,

Re: [HACKERS] Parallel Sort

Reply via email to