Tomas Vondra <tomas.von...@enterprisedb.com> writes:

>> I guess both of you are talking about worker process, if here are
>> something in my mind:
>> 
>> *btbuild* also let the WORKER dump the tuples into Sharedsort struct
>> and let the LEADER merge them directly. I think this aim of this design
>> is it is potential to save a mergeruns. In the current patch, worker dump
>> to local tuplesort and mergeruns it and then leader run the merges
>> again. I admit the goal of this patch is reasonable, but I'm feeling we
>> need to adapt this way conditionally somehow. and if we find the way, we
>> can apply it to btbuild as well. 
>> 
>
> I'm a bit confused about what you're proposing here, or how is that
> related to what this patch is doing and/or to the what Matthias
> mentioned in his e-mail from last week.
>
> Let me explain the relevant part of the patch, and how I understand the
> improvement suggested by Matthias. The patch does the work in three phases:

What's in my mind is:

1. WORKER-1

Tempfile 1:

key1:  1
key3:  2
...

Tempfile 2:

key5:  3
key7:  4
...

2. WORKER-2

Tempfile 1:

Key2: 1
Key6: 2
...

Tempfile 2:
Key3: 3
Key6: 4
..

In the above example:  if we do the the merge in LEADER, only 1 mergerun
is needed. reading 4 tempfile 8 tuples in total and write 8 tuples.

If we adds another mergerun into WORKER, the result will be:

WORKER1:  reading 2 tempfile 4 tuples and write 1 tempfile (called X) 4
tuples. 
WORKER2:  reading 2 tempfile 4 tuples and write 1 tempfile (called Y) 4
tuples. 

LEADER:  reading 2 tempfiles (X & Y) including 8 tuples and write it
into final tempfile.

So the intermedia result X & Y requires some extra effort.  so I think
the "extra mergerun in worker" is *not always* a win, and my proposal is
if we need to distinguish the cases in which one we should add the
"extra mergerun in worker" step.

> The trouble with (2) is that it "just copies" data from one tuplesort
> into another, increasing the disk space requirements. In an extreme
> case, when nothing can be combined, it pretty much doubles the amount of
> disk space, and makes the build longer.

This sounds like the same question as I talk above, However my proposal
is to distinguish which cost is bigger between "the cost saving from
merging TIDs in WORKERS" and "cost paid because of the extra copy",
then we do that only when we are sure we can benefits from it, but I
know it is hard and not sure if it is doable. 

> What I think Matthias is suggesting, is that this "TID list merging"
> could be done directly as part of the tuplesort in step (1). So instead
> of just moving the "sort tuples" from the appropriate runs, it could
> also do an optional step of combining the tuples and writing this
> combined tuple into the tuplesort result (for that worker).

OK, I get it now. So we talked about lots of merge so far at different
stage and for different sets of tuples.

1. "GIN deform buffer" did the TIDs merge for the same key for the tuples
in one "deform buffer" batch, as what the current master is doing.

2. "in memory buffer sort" stage, currently there is no TID merge so
far and Matthias suggest that. 

3. Merge the TIDs for the same keys in LEADER vs in WORKER first +
LEADER then. this is what your 0002 commit does now and I raised some
concerns as above.

> Matthias also mentioned this might be useful when building btree indexes
> with key deduplication.

> AFAICS this might work, although it probably requires for the "combined"
> tuple to be smaller than the sum of the combined tuples (in order to fit
> into the space). But at least in the GIN build in the workers this is
> likely true, because the TID lists do not overlap (and thus not hurting
> the compressibility).
>
> That being said, I still see this more as an optimization than something
> required for the patch,

If GIN deform buffer is big enough (like greater than the in memory
buffer sort) shall we have any gain because of this, since the
scope is the tuples in in-memory-buffer-sort. 

> and I don't think I'll have time to work on this
> anytime soon. The patch is not extremely complex, but it's not trivial
> either. But if someone wants to take a stab at extending tuplesort to
> allow this, I won't object ...

Agree with this. I am more interested with understanding the whole
design and the scope to fix in this patch, and then I can do some code
review and testing, as for now, I still in the "understanding design and
scope" stage. If I'm too slow about this patch, please feel free to
commit it any time and I don't expect I can find any valueable
improvement and bugs.  I probably needs another 1 ~ 2 weeks to study
this patch.

-- 
Best Regards
Andy Fan



Reply via email to