[ 
https://issues.apache.org/jira/browse/IMPALA-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542483#comment-17542483
 ] 

ASF subversion and git services commented on IMPALA-10465:
----------------------------------------------------------

Commit 4236c307b971881a3b1d85068db5b053a9c34cfa in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4236c307b ]

IMPALA-10465: Use IGNORE variant of Kudu write operations

KUDU-1563 added support for INSERT_IGNORE, UPDATE_IGNORE, and
DELETE_IGNORE to handle cases where users want to ignore primary key
errors efficiently. Impala already does this today for its INSERT
behavior. However, it does so by ignoring the per-row errors from Kudu
client side. This requires a large error buffer (which may need to be
expanded in rare cases) to log all of the warning messages which users
often do not care about and causes significant RPC overhead.

This patch change the Kudu write operation by Impala to use
INSERT_IGNORE, UPDATE_IGNORE, and DELETE_IGNORE if Kudu cluster supports
it and backend flag "kudu_ignore_conflicts" is true.

We benchmark the change by doing insert and update query on modified
tpch.lineitem table where we introduce conflicts for around half of the
total rows being modified. The table below shows the performance
difference after the patch:

+----------------------+--------+-------------+------------+------------+----------------+
| Query                | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base 
StdDev(%) |
+----------------------+--------+-------------+------------+------------+----------------+
| KUDU-IGNORE-3-UPDATE | 30.06  | 30.52       |   -1.53%   |   0.18%    |   
0.58%        |
| KUDU-IGNORE-2-INSERT | 48.91  | 71.09       | I -31.20%  |   0.60%    |   
0.72%        |
+----------------------+--------+-------------+------------+------------+----------------+

Testing:
- Pass core tests.

Change-Id: I8da7c41d61b0888378b390b8b643238433eb3b52
Reviewed-on: http://gerrit.cloudera.org:8080/18536
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Improve Kudu DML error logging in Impala
> ----------------------------------------
>
>                 Key: IMPALA-10465
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10465
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Grant Henke
>            Assignee: Riza Suminto
>            Priority: Major
>
> Kudu recently added support for INSERT_IGNORE, UPDATE_IGNORE, and 
> DELETE_IGNORE to handle cases where users want to ignore primary key errors 
> in an efficient way. Impala already does this today for it's INSERT behavior, 
> however it does so by ignoring the per-row errors from Kudu client side. This 
> requires a large error buffer (which may need to be expanded in rare cases) 
> to log all of the warning messages which users often do not care about and 
> causes significant RPC overhead.
> Instead it would be good to add a property or special IGNORE keyword to 
> leverage the new ignore operations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to