[ https://issues.apache.org/jira/browse/HIVE-28945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pravin updated HIVE-28945: -------------------------- Description: We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}} operation between two Hive tables with identical schemas. * The source table, {{{}account_data{}}}, is an *external table* containing *954 columns* and approximately {*}10,000 rows{*}. * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}} clause to mirror the schema of {{{}account_data{}}}. * {{account_data_temp}} is also an {*}external table{*}, created using the following statement: CREATE EXTERNAL TABLE account_data_temp LIKE account_data LOCATION 'hdfs://clustor1/user/account/account_data_temp'; The data transfer was performed using the following {{INSERT OVERWRITE}} query: INSERT OVERWRITE TABLE default.account_data_temp SELECT * FROM default.account_data; After executing the above query, we observed that few *rows were missing* in the target table ({{{}account_data_temp{}}}). A similar issue was noticed when inserting data from an *internal table to an external table* as well. *Key Observations:* * This issue is *not consistently reproducible* — it occurs intermittently. * The row count mismatch suggests *possible silent data loss* during the {{INSERT OVERWRITE}} operation. * No errors or warnings were reported during query execution. Note- Even we observed the same issue with table having 8 columns *using JDBC driver 4.0.1* was: We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}} operation between two Hive tables with identical schemas. * The source table, {{{}account_data{}}}, is an *external table* containing *954 columns* and approximately {*}10,000 rows{*}. * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}} clause to mirror the schema of {{{}account_data{}}}. * {{account_data_temp}} is also an {*}external table{*}, created using the following statement: CREATE EXTERNAL TABLE account_data_temp LIKE account_data LOCATION 'hdfs://clustor1/user/account/account_data_temp'; The data transfer was performed using the following {{INSERT OVERWRITE}} query: INSERT OVERWRITE TABLE default.account_data_temp SELECT * FROM default.account_data; After executing the above query, we observed that few *rows were missing* in the target table ({{{}account_data_temp{}}}). A similar issue was noticed when inserting data from an *internal table to an external table* as well. *Key Observations:* * This issue is *not consistently reproducible* — it occurs intermittently. * The row count mismatch suggests *possible silent data loss* during the {{INSERT OVERWRITE}} operation. * No errors or warnings were reported during query execution. *using JDBC driver 4.0.1* > Data loss observed during INSERT OVERWRITE from one table to another with > identical schema, involving both internal and external tables. > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-28945 > URL: https://issues.apache.org/jira/browse/HIVE-28945 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Affects Versions: 3.1.3 > Reporter: Pravin > Priority: Major > > We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}} > operation between two Hive tables with identical schemas. > * The source table, {{{}account_data{}}}, is an *external table* containing > *954 columns* and approximately {*}10,000 rows{*}. > * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}} > clause to mirror the schema of {{{}account_data{}}}. > * {{account_data_temp}} is also an {*}external table{*}, created using the > following statement: > CREATE EXTERNAL TABLE account_data_temp > LIKE account_data > LOCATION 'hdfs://clustor1/user/account/account_data_temp'; > > The data transfer was performed using the following {{INSERT OVERWRITE}} > query: > > INSERT OVERWRITE TABLE default.account_data_temp > SELECT * FROM default.account_data; > > After executing the above query, we observed that few *rows were missing* in > the target table ({{{}account_data_temp{}}}). A similar issue was noticed > when inserting data from an *internal table to an external table* as well. > > *Key Observations:* > * This issue is *not consistently reproducible* — it occurs intermittently. > * The row count mismatch suggests *possible silent data loss* during the > {{INSERT OVERWRITE}} operation. > * No errors or warnings were reported during query execution. > Note- Even we observed the same issue with table having 8 columns > *using JDBC driver 4.0.1* -- This message was sent by Atlassian Jira (v8.20.10#820010)