> On Jun 27, 2017, at 3:30 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>
> On Tue, Jun 27, 2017 at 12:40 PM, armand pirvu <armand.pi...@gmail.com> wrote:
>> so how is it working in fact ? Isn't it working like looping in the
>> IVEE.dim_company and for each company_id if the record does have a
>> correspondent in csischema.dim_company then update csischema.dim_company set
>> company_name = EXCLUDED.company_name where company_id=... ? If so isn't it
>> supposed to use the PK for each company_id ? Or is it more like building a
>> whole list from IVEE.dim_company and treat like a join ? Just trying to
>> understand
>
> The processing here has to happen a tuple at a time. That's because
> the index structure itself is the only authoritative source of truth
> about whether or not there is a possible conflict. An MVCC snapshot
> isn't good enough, because it sees a consistent view of things, not
> the true physical reality of what exists or does not exist in the
> index.
>
> What you end up with here is a weird nested loop join, you might say.
> The implementation couldn't do it any other way (this could never
> behave more like a merge join), because we must eagerly check for
> conflicts right as we insert (our insert would be registered by
> *other* inserters/upserters as a conflict). If the implementation did
> ever do it that way, it would break the important UPSERT guarantees
> around concurrency.
>
> MERGE does this in other systems, which is okay for those other
> systems because MERGE makes no special promises about concurrency
> (e.g., you can get a unique violation in the joined-on column with
> MERGE). But, MERGE would be faster for bulk loading, which is what
> MERGE is good for.
>
> --
> Peter Geoghegan
Hi Peter
So for example if IVEE.dim_company has 10k rows and csischema.dim_company has
40 rows, what will happen for each row in IVEE.dim_company we check
csischema.dim_company and if the check tells row is in it switches to update
and this would mean yes I scan IVEE.dim_company, however should an update be
needed in csischema.dim_company it will use the csischema.dim_company PK since
we pass one value gotten from IVEE.dim_company
The question I guess is what happens IF I IVEE.dim_company accounts for far
more than 5% of csischema.dim_company ? Will that translate into a scan on
csischema.dim_company ?
What I am looking at now looks like a potential racing contention which so I am
wondering if there are better ways to do it
Thanks
Armand