Manoj Durisheti created HIVE-17416:
--------------------------------------

             Summary: Hive Distinct changes column value
                 Key: HIVE-17416
                 URL: https://issues.apache.org/jira/browse/HIVE-17416
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: Manoj Durisheti


Hive 1.2.1000.2.6.1.0-129

Below query with distinct is expected to just dedupe the resultant data. But it 
alters the data.

*Query without Distinct:*
select
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from alpha.table_name
where
datestamp = 20170805
and
field_name = 
'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
;
Result:
e_2300a e_2300
e_2300a e_2300
e_2300a e_2300
e_2300a e_2300
e_2300a e_2300

*Query with Distinct:*
select distinct
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[A-Z]?)\\??.*', 1) r_field_name,
REGEXP_EXTRACT(UPPER(field_name), '([A-Z]_[0-9]*[a-z]?)\\??.*', 1) w_field_name
from alpha.table_name
where
datestamp = 20170805
and
field_name = 
'https://www.abcd.com/details/123-main-st-abcde-xx-84004-5434484-e_2300a'
;
Result:
e_2300 e_2300

*Expected Result with Distinct is: *
e_2300a e_2300





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to