Thomas Tauber-Marshall created IMPALA-6295: ----------------------------------------------
Summary: Inconsistent handling of 'nan' and 'inf' with min/max analytic fns Key: IMPALA-6295 URL: https://issues.apache.org/jira/browse/IMPALA-6295 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.11.0 Reporter: Thomas Tauber-Marshall Priority: Critical Incorrect results are returned in some cases where 'nan'/'inf' are the only values in the group and codegen is enabled: {noformat} > set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0 > select * from test1 order by col1 +------+-----------+ | col0 | col1 | +------+-----------+ | 0 | NaN | | 2 | -Infinity | | 3 | 0 | | 1 | Infinity | +------+-----------+ > set DISABLE_CODEGEN set to true > select col0, min(col1) from test1 group by col0 order by col0 +------+-----------+ | col0 | min(col1) | +------+-----------+ | 0 | NaN | | 1 | Infinity | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, min(col1) from test1 group by col0 order by col0 +------+------------------------+ | col0 | min(col1) | +------+------------------------+ | 0 | 1.797693134862316e+308 | | 1 | 1.797693134862316e+308 | | 2 | -Infinity | | 3 | 0 | +------+------------------------+ > set DISABLE_CODEGEN set to true > select col0, max(col1) from test1 group by col0 order by col0 +------+-----------+ | col0 | max(col1) | +------+-----------+ | 0 | NaN | | 1 | Infinity | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, max(col1) from test1 group by col0 order by col0 +------+-------------------------+ | col0 | max(col1) | +------+-------------------------+ | 0 | -1.797693134862316e+308 | | 1 | Infinity | | 2 | -1.797693134862316e+308 | | 3 | 0 | +------+-------------------------+ {noformat} We also appear to never return 'nan' as a min or max value despite sorted it as the lowest value when ordering a table (perhaps this is the intended behavior?): {noformat} > set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0 > select * from test2 order by col1 +------+-----------+ | col0 | col1 | +------+-----------+ | 0 | NaN | | 2 | -Infinity | | 0 | 0 | | 3 | 0 | | 1 | 1 | | 2 | 2 | | 3 | 3 | | 1 | Infinity | +------+-----------+ > set DISABLE_CODEGEN set to true > select col0, min(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | min(col1) | +------+-----------+ | 0 | 0 | | 1 | 1 | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, min(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | min(col1) | +------+-----------+ | 0 | 0 | | 1 | 1 | | 2 | -Infinity | | 3 | 0 | +------+-----------+ > set DISABLE_CODEGEN set to true > select col0, max(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | max(col1) | +------+-----------+ | 0 | 0 | | 1 | Infinity | | 2 | 2 | | 3 | 3 | +------+-----------+ > set DISABLE_CODEGEN set to false > select col0, max(col1) from test2 group by col0 order by col0 +------+-----------+ | col0 | max(col1) | +------+-----------+ | 0 | 0 | | 1 | Infinity | | 2 | 2 | | 3 | 3 | +------+-----------+ {noformat} Changing LlvmCodeGen::CodegenMinMax to use OLT/OGT float comparison functions appears to solve the first case (at least for 'nan'), but leads to us returning 'nan' as a max value in the second case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)