[ https://issues.apache.org/jira/browse/IMPALA-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302793#comment-17302793 ]
Aman Sinha commented on IMPALA-10564: ------------------------------------- Capturing some summarized comments from the Gerrit review (https://gerrit.cloudera.org/c/17168/) and offline discussion with [~wzhou]: * The ideal long term solution would be to skip the rows that have a decimal overflow (or other) and optionally log them in a staging area (similar to what ETL products do) and provide a return status that says 'Inserted N rows, skipped M rows' (we already display the first part of this message on success). The motivation for this is that a CTAS or INSERT-SELECT of billion rows should not be completely aborted due to 1 or few decimal value error. * However, skipping rows during the write to a columnar format such as Parquet requires more thought and investigation..it requires rewinding to the previous row. * One near term option is to merge the patch changes but make the behavior configurable. We could introduce a query option use_null_for_decimal_errors which would be FALSE by default ..so the CTAS would fail. So, users have to opt-in to allow NULLs to be inserted (making it a conscious choice). > No error returned when inserting an overflowed value into a decimal column > -------------------------------------------------------------------------- > > Key: IMPALA-10564 > URL: https://issues.apache.org/jira/browse/IMPALA-10564 > Project: IMPALA > Issue Type: Bug > Components: Backend, Frontend > Affects Versions: Impala 4.0 > Reporter: Wenzhe Zhou > Assignee: Wenzhe Zhou > Priority: Major > > When using CTAS statements or INSERT-SELECT statements to insert rows to > table with decimal columns, Impala insert NULL for overflowed decimal values, > instead of returning error. This issue happens when the data expression for > the decimal column in SELECT sub-query consists at least one alias. This > issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the > cases with the data expression for the decimal columns as constants so that > the overflowed decimal values could be detected by frontend during expression > analysis. If there is alias (variables) in the data expression for the > decimal column, Frontend could not evaluate data expression in expression > analysis phase. Only backend could evaluate the data expression when backend > execute fragment instances for SELECT sub-queries. The log messages showed > that the executor detected the decimal overflow error, but somehow it did not > propagate the error to the coordinator, hence the error was not returned to > the client. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org