Hello,

While using the Impala random query generator to compare Impala query
results with PostgreSQL query results, I noticed a discrepancy between
the CONCAT() functions. I'm looking for information from the community
whether the discrepancy I found is intentional (in which case I will
fix the random query generator) or unintentional (in which case I will
file an Impala defect Jira). I couldn't find bugs about this, though I
did see IMPALA-452, but IMPALA-452 doesn't explicitly call out
behavior discrepancies or preferences.

If Impala CONCAT() has an argument that evaluates to NULL, then Impala
CONCAT() returns NULL.

If PostgreSQL CONCAT() has an argument that evaluates to NULL, then
PostgreSQL will ignore it, and treat it as an empty string. Even
CONCAT(NULL) evaluates to the empty string.

PostgreSQL has a || operator that behaves like Impala CONCAT(): if a
NULL expression is on one side of ||, that || evals to NULL. (Aside:
the || operator in Impala appears to be an alias for OR, though that
seems to be undocumented. IMPALA-452 suggests this is a Hive mimic.)

I also checked Oracle (XE 11.2.0.2.0). Oracle also has a CONCAT()
function and a string concatenation operator ||. They are equivalent.
There is a difference between Oracle behavior and other databases: a
CONCAT() or || with only nulls or empty strings evaluates to NULL.
That's somewhat different edge behavior, but the CONCAT('something',
NULL) case does mimic PostgreSQL CONCAT(), not Impala CONCAT().

If Impala CONCAT() is meant to behave like PostgreSQL ||, then I can
change the random query generator to write the correct SQL dialect
from the same logical query to reduce false positive discrepancies. If
Impala CONCAT() is meant to behave like PostgreSQL CONCAT(), then
there's a defect. Please let me know the intent so I can act
accordingly.

Thanks!

Reply via email to