[ 
https://issues.apache.org/jira/browse/IMPALA-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302793#comment-17302793
 ] 

Aman Sinha commented on IMPALA-10564:
-------------------------------------

Capturing some summarized comments from the Gerrit review 
(https://gerrit.cloudera.org/c/17168/) and offline discussion with [~wzhou]:
* The ideal long term solution would be to skip the rows that have a decimal 
overflow (or other) and optionally log them in a staging area (similar to what 
ETL products do) and provide a return status that says 'Inserted N rows, 
skipped M rows'  (we already display the first part of this message on 
success). The motivation for this is that a CTAS or INSERT-SELECT of billion 
rows should not be completely aborted due to 1 or few decimal value error.
* However, skipping rows during the write to a columnar format such as Parquet 
requires more thought and investigation..it requires rewinding to the previous 
row. 
* One near term option is to merge the patch changes but make the behavior 
configurable.  We could introduce a query option use_null_for_decimal_errors 
which would be FALSE by default ..so the CTAS would fail.  So, users have to 
opt-in to allow NULLs to be inserted (making it a conscious choice). 

> No error returned when inserting an overflowed value into a decimal column
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-10564
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10564
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>    Affects Versions: Impala 4.0
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> When using CTAS statements or INSERT-SELECT statements to insert rows to 
> table with decimal columns, Impala insert NULL for overflowed decimal values, 
> instead of returning error. This issue happens when the data expression for 
> the decimal column in SELECT sub-query consists at least one alias. This 
> issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the 
> cases with the data expression for the decimal columns as constants so that 
> the overflowed decimal values could be detected by frontend during expression 
> analysis.  If there is alias (variables) in the data expression for the 
> decimal column, Frontend could not evaluate data expression in expression 
> analysis phase. Only backend could evaluate the data expression when backend 
> execute fragment instances for SELECT sub-queries. The log messages showed 
> that the executor detected the decimal overflow error, but somehow it did not 
> propagate the error to the coordinator, hence the error was not returned to 
> the client.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to