[ 
https://issues.apache.org/jira/browse/HIVE-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Zeller updated HIVE-17249:
-------------------------------
    Attachment: partition1log.txt

Attached is an extract of messages relating to one partition from the
hivemetastore.log file.

> Concurrent appendPartition calls lead to data loss
> --------------------------------------------------
>
>                 Key: HIVE-17249
>                 URL: https://issues.apache.org/jira/browse/HIVE-17249
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 1.2.1
>         Environment: Hortonworks HDP 2.4.
> MySQL metastore.
>            Reporter: Hans Zeller
>         Attachments: partition1log.txt
>
>
> We are running into a problem with data getting lost when loading data in 
> parallel into a partitioned Hive table. The data loader runs on multiple 
> nodes and it dynamically creates partitions as it needs them, using the 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(String,
> String, String) interface. We assume that if multiple processes try to create 
> the same partition at the same time, only one of them succeeds while the 
> others fail.
> What we are seeing is that the partition gets created, but a few of the 
> created files end up in the .Trash folder in HDFS. From the metastore log, we 
> assume the following is happening in the threads of the metastore server:
> - Thread 1: A first process tries to create a partition.
> - Thread 1: The 
> org.apache.hadoop.hive.metastore.HiveMetaStore.append_common() method
> creates the HDFS directory.
> - Thread 2: A second process tries to create the same partition.
> - Thread 2: Notices that the directory already exists and skips the step of 
> creating it.
> - Thread 2: Update the metastore.
> - Thread 2: Return success to the caller.
> - Caller 2: Create a file in the partition directory and start inserting.
> - Thread 1: Try to update the metastore, but this fails, since thread 2 
> already has inserted the partition. Retry the operation, but it still fails.
> - Thread 1: Abort the transaction and move the HDFS directory to the trash, 
> since it knows that it created the directory.
> - Thread 1: Return failure to the caller.
> The first caller can now continue to load data successfully, but the file it 
> loads is actually already in the trash. It returns success, but the data is 
> not inserted and not visible in the table.
> Note that in our case, the callers that got an error continue as well - they 
> ignore the error. I think they automatically create the HDFS partition 
> directory when they create their output files. These processes can insert 
> data successfully, the data that is lost is from the process that 
> successfully created the partition, we believe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to