[ https://issues.apache.org/jira/browse/HIVE-18563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deepak Jaiswal reassigned HIVE-18563: ------------------------------------- Assignee: Deepak Jaiswal > "Load data into table" behavior is different between 1.2.1 and 1.2.1000 > ----------------------------------------------------------------------- > > Key: HIVE-18563 > URL: https://issues.apache.org/jira/browse/HIVE-18563 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 > Environment: * OS : CentOS6 > * JDK : 1.8.0_152(Oracle) > * HDP : 2.3.2.0 and 2.6.2.0 > * Hive : 1.2.1.2.3.2.0-2950 and 1.2.1000.2.6.2.0-205 > Reporter: Junichi Oda > Assignee: Deepak Jaiswal > Priority: Major > > After upgrading HDP from 2.3.2.0 to 2.6.2.0, the "load data into table" > behavior changed. > Data is input hourly, All files have the same name. > {code:java} > /user/user1/logs/yyyymmdd/00/part-r-00000.gz > /user/user1/logs/yyyymmdd/01/part-r-00000.gz > /user/user1/logs/yyyymmdd/02/part-r-00000.gz > /user/user1/logs/yyyymmdd/03/part-r-00000.gz > ・・・・・・・・・・・・・・・・・・・・・・・ > /user/user1/logs/yyyymmdd/22/part-r-00000.gz > /user/user1/logs/yyyymmdd/23/part-r-00000.gz > {code} > Before upgrade (HDP 2.3.2.0 ) > {code:java} > HQL > hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table > sample_db.sample_tbl partition (dt='yyyymmdd'); > > > Result > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_1.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_10.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_11.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_12.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_13.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_14.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_15.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_16.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_17.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_18.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_19.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_2.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_20.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_21.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_22.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_23.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_3.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_4.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_5.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_6.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_7.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_8.gz > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_9.gz > {code} > All files were renamed into part-r-0000_copy_*.gz without the file > part-r-0000.gz. > After upgrade(HDP 2.6.2.0 ) > {code:java} > HQL > hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table > sample_db.sample_tbl partition (dt='yyyymmdd'); > > Result > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd > /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz > {code} > There is only part-r-0000.gz. > This file was the same file as part-r-0000_copy_23.gz. > When files are loaded one by one, I can load all files like as HDP 2.3.2.0 > environment. > Why is the behavior different between 2.3.2.0 and 2.6.2.0 ? > Thanks in advance > > https://community.hortonworks.com/questions/158176/load-data-into-table-behavior-is-different-between.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)