Re: [Discussion] Optimize the Update Performance

Ajantha Bhat Wed, 13 May 2020 21:32:31 -0700

Hi !,
Update is still using converter step with bad record handing.

If it is update by dataframe scenario no need of bad record handling,
only for update by value case we can keep it.


This can give significant improvement as we already observed in insert flow.

I tried once to send it to new insert into flow. But because of implicit
column. Plan rearrange failed.
I didn't continue this because of other work.
May be I have to look into it again and see if it can work.

Thanks,
Ajantha

On Thu, May 14, 2020 at 9:51 AM haomarch <marchp...@126.com> wrote:

> I have serveral ideas to optimize the update performance:
> 1. Reduce the storage size of tupleId:
>    The tupleId is too long leading heavily shuffle IO overhead while join
> change table with target table.
> 2. Avoid to convert String to UTF8String in the row processing.
>    Before write rows into delta files, The convertfrom string to UTFString
> hamers some performance
>    Code: "UTF8String.fromString(row.getString(tupleId))"
> 3. For DELETE ops in the MergeDataCommand, we shouldn't joint the whole
> columns of change table take part in the JOIN ops. Only the "key" column is
> needed.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [Discussion] Optimize the Update Performance

Reply via email to