[ 
https://issues.apache.org/jira/browse/HIVE-28945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin updated HIVE-28945:
--------------------------
    Description: 
We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}} 
operation between two Hive tables with identical schemas.
 * The source table, {{{}account_data{}}}, is an *external table* containing 
*954 columns* and approximately {*}10,000 rows{*}.

 * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}} 
clause to mirror the schema of {{{}account_data{}}}.

 * {{account_data_temp}} is also an {*}external table{*}, created using the 
following statement:

CREATE EXTERNAL TABLE account_data_temp
LIKE account_data
LOCATION 'hdfs://clustor1/user/account/account_data_temp';

 

The data transfer was performed using the following {{INSERT OVERWRITE}} query:

 

INSERT OVERWRITE TABLE default.account_data_temp 
SELECT * FROM default.account_data;

 

After executing the above query, we observed that few *rows were missing* in 
the target table ({{{}account_data_temp{}}}). A similar issue was noticed when 
inserting data from an *internal table to an external table* as well.

 

*Key Observations:*
 * This issue is *not consistently reproducible* — it occurs intermittently.

 * The row count mismatch suggests *possible silent data loss* during the 
{{INSERT OVERWRITE}} operation.

 * No errors or warnings were reported during query execution.

Note- Even we observed the same issue with table having 8 columns

*using JDBC driver 4.0.1*

  was:
We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}} 
operation between two Hive tables with identical schemas.
 * The source table, {{{}account_data{}}}, is an *external table* containing 
*954 columns* and approximately {*}10,000 rows{*}.

 * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}} 
clause to mirror the schema of {{{}account_data{}}}.

 * {{account_data_temp}} is also an {*}external table{*}, created using the 
following statement:

CREATE EXTERNAL TABLE account_data_temp
LIKE account_data
LOCATION 'hdfs://clustor1/user/account/account_data_temp';

 

The data transfer was performed using the following {{INSERT OVERWRITE}} query:

 

INSERT OVERWRITE TABLE default.account_data_temp 
SELECT * FROM default.account_data;

 

After executing the above query, we observed that few *rows were missing* in 
the target table ({{{}account_data_temp{}}}). A similar issue was noticed when 
inserting data from an *internal table to an external table* as well.

 

*Key Observations:*
 * This issue is *not consistently reproducible* — it occurs intermittently.

 * The row count mismatch suggests *possible silent data loss* during the 
{{INSERT OVERWRITE}} operation.

 * No errors or warnings were reported during query execution.

 

*using JDBC driver 4.0.1*


> Data loss observed during INSERT OVERWRITE from one table to another with 
> identical schema, involving both internal and external tables.
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-28945
>                 URL: https://issues.apache.org/jira/browse/HIVE-28945
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.1.3
>            Reporter: Pravin
>            Priority: Major
>
> We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}} 
> operation between two Hive tables with identical schemas.
>  * The source table, {{{}account_data{}}}, is an *external table* containing 
> *954 columns* and approximately {*}10,000 rows{*}.
>  * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}} 
> clause to mirror the schema of {{{}account_data{}}}.
>  * {{account_data_temp}} is also an {*}external table{*}, created using the 
> following statement:
> CREATE EXTERNAL TABLE account_data_temp
> LIKE account_data
> LOCATION 'hdfs://clustor1/user/account/account_data_temp';
>  
> The data transfer was performed using the following {{INSERT OVERWRITE}} 
> query:
>  
> INSERT OVERWRITE TABLE default.account_data_temp 
> SELECT * FROM default.account_data;
>  
> After executing the above query, we observed that few *rows were missing* in 
> the target table ({{{}account_data_temp{}}}). A similar issue was noticed 
> when inserting data from an *internal table to an external table* as well.
>  
> *Key Observations:*
>  * This issue is *not consistently reproducible* — it occurs intermittently.
>  * The row count mismatch suggests *possible silent data loss* during the 
> {{INSERT OVERWRITE}} operation.
>  * No errors or warnings were reported during query execution.
> Note- Even we observed the same issue with table having 8 columns
> *using JDBC driver 4.0.1*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to