Manoj Durisheti created HIVE-17416: -------------------------------------- Summary: Hive Distinct changes column value Key: HIVE-17416 URL: https://issues.apache.org/jira/browse/HIVE-17416 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Manoj Durisheti
Hive 1.2.1000.2.6.1.0-129 Below query with distinct is expected to just dedupe the resultant data. But it alters the data. *Query without Distinct:* select REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name, REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name from alpha.table_name where datestamp = 20170805 and field_name = 'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a' ; Result: e_2300a e_2300 e_2300a e_2300 e_2300a e_2300 e_2300a e_2300 e_2300a e_2300 *Query with Distinct:* select distinct REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name, REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name from alpha.table_name where datestamp = 20170805 and field_name = 'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a' ; Result: e_2300 e_2300 *Expected Result with Distinct is: * e_2300a e_2300 -- This message was sent by Atlassian JIRA (v6.4.14#64029)