[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause HDFS folder to be deleted

2020-05-11 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Description: 
There seems to be a race condition in Hive Metasore when issuing several 
concurrent partition creation requests for the same new partition.

In our case, this triggered due to Kafka Connect Hive integration, which fires 
simultaneous partition creation requests from all its tasks when syncing to 
Hive.

We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
the same in 3.1.2 (latest Hive release).

Our investigation pointed to the following code:

[https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]

The generic scenario is the following:
 # T1 (time period 1):
 ** R1 (request 1) creates the HDFS dir
 ** R2 also tries creating the HDFS dir
 ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
interverted)
 # T2:
 ** R1 creates the partition in metastore DB, all OK
 # T3:
 ** R2 tries to create partition in metastore DB, gets exception because it 
exists. Rollback transaction.
 ** R2 thinks it created the directory, so it removes it
 # T4: State is invalid:
 ## Partition exists
 ## HDFS folder does not exist
 ## Some Hive/Spark queries fail when trying to use the folder

Here are some logs of the issue happening on our cluster (R1 = thread 2303, R2 
= thread 2302):
{code:java}
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
e=2020-05-11/hour=11
2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
  Detail: Key ("PART_NAME", 
"TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) 
already exists.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:720)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:740)
 

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause HDFS folder to be deleted

2020-05-11 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Affects Version/s: 2.1.0
   3.1.2

> Concurrent partition creation requests cause HDFS folder to be deleted
> --
>
> Key: HIVE-23437
> URL: https://issues.apache.org/jira/browse/HIVE-23437
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Affects Versions: 2.1.0, 3.1.2
> Environment: * Hive version: 1.2.1000.2.6.5.0-292 (HDP 2.6.5)
>  * HDFS version: 2.7.3.2.6.5.0-292 (HDP 2.6.5)
>Reporter: Marc Demierre
>Priority: Major
>
> There seems to be a race condition in Hive Metasore when issuing several 
> concurrent partition creation requests for the same new partition.
> In our case, this triggered due to Kafka Connect Hive integration, which 
> fires simultaneous partition creation requests from all its tasks when 
> syncing to Hive.
> We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
> the same in 3.1.2 (latest Hive release).
> Our investigation pointed to the following code:
> [https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]
> The generic scenario is the following:
>  # T1 (time period 1):
>  ** R1 (request 1) creates the HDFS dir
>  ** R2 also tries creating the HDFS dir
>  ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
> interverted)
>  # T2:
>  ** R1 creates the partition in metastore DB, all OK
>  # T3:
>  ** R2 tries to create partition in metastore DB, gets exception because it 
> exists. Rollback transaction.
>  ** R2 thinks it created the directory, so it removes it
>  # T4: State is invalid:
>  ## Partition exists
>  ## HDFS folder does not exist
>  ## Some Hive/Spark queries fail when trying to use the folder
> Here are some logs of the issue happening on our cluster (R1 = thread 2303, 
> R2 = thread 2302):
> {code:java}
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
> db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
> ket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
> db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
> ket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
> (HiveMetaStore.java:logAuditEvent(319)) - 
> ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
> cmd=append_partition_by_name
> : db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
> part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
> (HiveMetaStore.java:logAuditEvent(319)) - 
> ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
> cmd=append_partition_by_name
> : db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
> part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
> (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
> /date=2020-05-11/hour=11
> 2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
> (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
> /date=2020-05-11/hour=11
> 2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
> metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
> deleting  
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
> e=2020-05-11/hour=11
> 2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
> (TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
> 'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
>  to trash at: 
> hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
> metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - 
> Moved to trash: 
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
> Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
> javax.

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause HDFS folder to be deleted

2020-05-11 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Description: 
There seems to be a race condition in Hive Metasore when issuing several 
concurrent partition creation requests for the same new partition.

In our case, this triggered due to Kafka Connect Hive integration, which fires 
simultaneous partition creation requests from all its tasks when syncing to 
Hive.

We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
the same in 3.1.2 (latest Hive release).

Our investigation pointed to the following code (here in Hive 2.1.0, the base 
for HDP 2.6.5):

[https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]

Same code in 3.1.2:

https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202

The generic scenario is the following:
 # T1 (time period 1):
 ** R1 (request 1) creates the HDFS dir
 ** R2 also tries creating the HDFS dir
 ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
interverted)
 # T2:
 ** R1 creates the partition in metastore DB, all OK
 # T3:
 ** R2 tries to create partition in metastore DB, gets exception because it 
exists. Rollback transaction.
 ** R2 thinks it created the directory, so it removes it
 # T4: State is invalid:
 ## Partition exists
 ## HDFS folder does not exist
 ## Some Hive/Spark queries fail when trying to use the folder

Here are some logs of the issue happening on our cluster (R1 = thread 2303, R2 
= thread 2302):
{code:java}
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
e=2020-05-11/hour=11
2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
  Detail: Key ("PART_NAME", 
"TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) 
already exists.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
   

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-11 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Summary: Concurrent partition creation requests cause underlying HDFS 
folder to be deleted  (was: Concurrent partition creation requests cause HDFS 
folder to be deleted)

> Concurrent partition creation requests cause underlying HDFS folder to be 
> deleted
> -
>
> Key: HIVE-23437
> URL: https://issues.apache.org/jira/browse/HIVE-23437
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Affects Versions: 2.1.0, 3.1.2
> Environment: * Hive version: 1.2.1000.2.6.5.0-292 (HDP 2.6.5)
>  * HDFS version: 2.7.3.2.6.5.0-292 (HDP 2.6.5)
>Reporter: Marc Demierre
>Priority: Major
>
> There seems to be a race condition in Hive Metasore when issuing several 
> concurrent partition creation requests for the same new partition.
> In our case, this triggered due to Kafka Connect Hive integration, which 
> fires simultaneous partition creation requests from all its tasks when 
> syncing to Hive.
> We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
> the same in 3.1.2 (latest Hive release).
> Our investigation pointed to the following code (here in Hive 2.1.0, the base 
> for HDP 2.6.5):
> [https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]
> Same code in 3.1.2:
> https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202
> The generic scenario is the following:
>  # T1 (time period 1):
>  ** R1 (request 1) creates the HDFS dir
>  ** R2 also tries creating the HDFS dir
>  ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
> interverted)
>  # T2:
>  ** R1 creates the partition in metastore DB, all OK
>  # T3:
>  ** R2 tries to create partition in metastore DB, gets exception because it 
> exists. Rollback transaction.
>  ** R2 thinks it created the directory, so it removes it
>  # T4: State is invalid:
>  ## Partition exists
>  ## HDFS folder does not exist
>  ## Some Hive/Spark queries fail when trying to use the folder
> Here are some logs of the issue happening on our cluster (R1 = thread 2303, 
> R2 = thread 2302):
> {code:java}
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
> db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
> ket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
> db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
> ket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
> (HiveMetaStore.java:logAuditEvent(319)) - 
> ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
> cmd=append_partition_by_name
> : db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
> part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
> (HiveMetaStore.java:logAuditEvent(319)) - 
> ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
> cmd=append_partition_by_name
> : db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
> part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
> (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
> /date=2020-05-11/hour=11
> 2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
> (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
> /date=2020-05-11/hour=11
> 2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
> metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
> deleting  
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
> e=2020-05-11/hour=11
> 2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
> (TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
> 'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
>  to trash at: 
> hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
> metastore.hivemetastoressimpl (HiveMetaSt

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-11 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Description: 
There seems to be a race condition in Hive Metasore when issuing several 
concurrent partition creation requests for the same new partition.

In our case, this triggered due to Kafka Connect Hive integration, which fires 
simultaneous partition creation requests from all its tasks when syncing to 
Hive.

We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
the same in 3.1.2 (latest Hive release).

Our investigation pointed to the following code (here in Hive 2.1.0, the base 
for HDP 2.6.5):

[https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]

Same code in 3.1.2:

https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202

The generic scenario is the following:
 # T1 (time period 1):
 ** R1 (request 1) creates the HDFS dir
 ** R2 also tries creating the HDFS dir
 ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
interverted)
 # T2:
 ** R1 creates the partition in metastore DB, all OK
 # T3:
 ** R2 tries to create partition in metastore DB, gets exception from DB 
because it exists. Rollback transaction.
 ** R2 thinks it created the directory (in fact they both did we do not know 
who), so it removes it
 # T4: State is invalid:
 ## Partition exists
 ## HDFS folder does not exist
 ## Some Hive/Spark queries fail when trying to use the folder

Here are some logs of the issue happening on our cluster (R1 = thread 2303, R2 
= thread 2302):
{code:java}
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
e=2020-05-11/hour=11
2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
  Detail: Key ("PART_NAME", 
"TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) 
already exists.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExcepti

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-11 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Description: 
There seems to be a race condition in Hive Metasore when issuing several 
concurrent partition creation requests for the same new partition.

In our case, this triggered due to Kafka Connect Hive integration, which fires 
simultaneous partition creation requests from all its tasks when syncing to 
Hive.

We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
the same in 3.1.2 (latest Hive release).

Our investigation pointed to the following code (here in Hive 2.1.0, the base 
for HDP 2.6.5):

[https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]

Same code in 3.1.2:

https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202

The generic scenario is the following:
 # T1 (time period 1):
 ** R1 (request 1) creates the HDFS dir
 ** R2 also tries creating the HDFS dir
 ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
interverted)
 # T2:
 ** R1 creates the partition in metastore DB, all OK
 # T3:
 ** R2 tries to create partition in metastore DB, gets exception from DB 
because it exists. Rollback transaction.
 ** R2 thinks it created the directory (in fact they both did we do not know 
who), so it removes it
 # T4: State is invalid:
 ## Partition exists
 ## HDFS folder does not exist
 ## Some Hive/Spark queries fail when trying to use the folder

Here are some logs of the issue happening on our cluster in a standalone 
metastore (R1 = thread 2303, R2 = thread 2302):
{code:java}
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
e=2020-05-11/hour=11
2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
  Detail: Key ("PART_NAME", 
"TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) 
already exists.
at 
org.datanucleus.api.jdo.Nucl

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-12 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Description: 
There seems to be a race condition in Hive Metasore when issuing several 
concurrent partition creation requests for the same new partition.

In our case, this triggered due to Kafka Connect Hive integration, which fires 
simultaneous partition creation requests from all its tasks when syncing to 
Hive.

We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
the same in 3.1.2 (latest Hive release).

Our investigation pointed to the following code (here in Hive 2.1.0, the base 
for HDP 2.6.5):

[https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]

Same code in 3.1.2:

https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202

The generic scenario is the following:
 # T1 (time period 1):
 ** R1 (request 1) creates the HDFS dir
 ** R2 also tries creating the HDFS dir
 ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
interverted)
 # T2:
 ** R1 creates the partition in metastore DB, all OK
 # T3:
 ** R2 tries to create partition in metastore DB, gets exception from DB 
because it exists. Rollback transaction.
 ** R2 thinks it created the directory (in fact they both did we do not know 
who), so it removes it
 # T4: State is invalid:
 ## Partition exists
 ## HDFS folder does not exist
 ## Some Hive/Spark queries fail when trying to use the folder

Here are some logs of the issue happening on our cluster in a standalone 
metastore (R1 = thread 2303, R2 = thread 2302):
{code:java}
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
cmd=append_partition_by_name
: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
e=2020-05-11/hour=11
2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
  Detail: Key ("PART_NAME", 
"TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) 
already exists.
at 
org.datanucleus.api.jdo.Nucl

[jira] [Commented] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-12 Thread Marc Demierre (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105218#comment-17105218
 ] 

Marc Demierre commented on HIVE-23437:
--

After trying a workaround on the client side to delay the calls, we observed a 
second instance of the problem:
 * T1:
 ** R1 creates the directory, then is paused/waiting
 * T2:
 ** R2 arrives, does not create the directory as it exists
 ** R2 creates the partition (wins the race on DB) and completes
 * T3:
 ** R1 resumes, sees that it failed the DB transaction, deletes the folder

Relevant logs (R1=2558, R2=2556):
{code:java}
 2020-05-11 20:00:00,944 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:00,945 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2556: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,481 INFO  [pool-7-thread-2558]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,521 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(352)) - Updating partition stats 
fast for: sbd_dev_debug_hive_4
2020-05-11 20:00:01,537 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(355)) - Updated size to 0
2020-05-11 20:00:01,764 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,788 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@3254e57d" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,869 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
AlreadyExistsException(message:Partition already 
exists:Partition(values:[ingestion, hourly, 2020-05-11, 18], 
dbName:sbd_dev_autodump, tableName:sbd_dev_debug_hive_4, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, 
type:string, comment:null), FieldSchema(name:age, type:int, comment:null), 
FieldSchema(name:salary, type:int, comment:null)], 
location:hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18,
 inputFormat:org.apache.hadoop.hiv

[jira] [Comment Edited] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-12 Thread Marc Demierre (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105218#comment-17105218
 ] 

Marc Demierre edited comment on HIVE-23437 at 5/12/20, 8:25 AM:


After trying a workaround on the client side to ensure the calls are not 
simultaneous by delaying them. It didn't solve the issue, only made it rarer. 
We also observed a second instance of the problem:
 * T1:
 ** R1 creates the directory, then is paused/waiting
 * T2:
 ** R2 arrives, does not create the directory as it exists
 ** R2 creates the partition (wins the race on DB) and completes
 * T3:
 ** R1 resumes, sees that it failed the DB transaction, deletes the folder

Relevant logs (R1=2558, R2=2556):
{code:java}
 2020-05-11 20:00:00,944 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:00,945 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2556: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,481 INFO  [pool-7-thread-2558]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,521 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(352)) - Updating partition stats 
fast for: sbd_dev_debug_hive_4
2020-05-11 20:00:01,537 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(355)) - Updated size to 0
2020-05-11 20:00:01,764 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,788 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@3254e57d" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,869 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
AlreadyExistsException(message:Partition already 
exists:Partition(values:[ingestion, hourly, 2020-05-11, 18], 
dbName:sbd_dev_autodump, tableName:sbd_dev_debug_hive_4, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, 
type:string, comment:null), FieldSchema(name:age, type:int, comment:null), 
FieldSchema(name:salary, type:int, comment:null)], 

[jira] [Comment Edited] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-12 Thread Marc Demierre (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105218#comment-17105218
 ] 

Marc Demierre edited comment on HIVE-23437 at 5/12/20, 8:25 AM:


We tried a workaround on the client side to ensure the calls are not 
simultaneous by delaying them. It didn't solve the issue, only made it rarer. 
We also observed a second instance of the problem:
 * T1:
 ** R1 creates the directory, then is paused/waiting
 * T2:
 ** R2 arrives, does not create the directory as it exists
 ** R2 creates the partition (wins the race on DB) and completes
 * T3:
 ** R1 resumes, sees that it failed the DB transaction, deletes the folder

Relevant logs (R1=2558, R2=2556):
{code:java}
 2020-05-11 20:00:00,944 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:00,945 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2556: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,481 INFO  [pool-7-thread-2558]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,521 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(352)) - Updating partition stats 
fast for: sbd_dev_debug_hive_4
2020-05-11 20:00:01,537 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(355)) - Updated size to 0
2020-05-11 20:00:01,764 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,788 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@3254e57d" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,869 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
AlreadyExistsException(message:Partition already 
exists:Partition(values:[ingestion, hourly, 2020-05-11, 18], 
dbName:sbd_dev_autodump, tableName:sbd_dev_debug_hive_4, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, 
type:string, comment:null), FieldSchema(name:age, type:int, comment:null), 
FieldSchema(name:salary, type:int, comment:null)], 
loc

[jira] [Comment Edited] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-12 Thread Marc Demierre (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105218#comment-17105218
 ] 

Marc Demierre edited comment on HIVE-23437 at 5/12/20, 8:44 AM:


We tried a workaround on the client side to ensure the calls are not 
simultaneous by delaying them. It didn't solve the issue, only made it rarer. 
We also observed a second instance of the problem which is slightly different:
 * T1:
 ** R1 creates the directory, then is paused/waiting
 * T2:
 ** R2 arrives, does not create the directory as it exists
 ** R2 creates the partition (wins the race on DB) and completes
 * T3:
 ** R1 resumes, sees that it failed the DB transaction, deletes the folder

Relevant logs (R1=2558, R2=2556):
{code:java}
 2020-05-11 20:00:00,944 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:00,945 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2556: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,481 INFO  [pool-7-thread-2558]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,521 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(352)) - Updating partition stats 
fast for: sbd_dev_debug_hive_4
2020-05-11 20:00:01,537 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(355)) - Updated size to 0
2020-05-11 20:00:01,764 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18'
 to trash at: 
hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Current/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,788 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@3254e57d" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - 
ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.2  
cmd=append_partition_by_name: db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,869 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
AlreadyExistsException(message:Partition already 
exists:Partition(values:[ingestion, hourly, 2020-05-11, 18], 
dbName:sbd_dev_autodump, tableName:sbd_dev_debug_hive_4, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, 
type:string, comment:null), FieldSchema(name:age, type:int, comment:null), 
FieldSchema(name:salary, ty

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2020-05-12 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Environment: 
* Hive version: 1.2.1000.2.6.5.0-292 (HDP 2.6.5)
 * HDFS version: 2.7.3.2.6.5.0-292 (HDP 2.6.5)
 * Metastore DB: PostgreSQL 9.4

  was:
* Hive version: 1.2.1000.2.6.5.0-292 (HDP 2.6.5)
 * HDFS version: 2.7.3.2.6.5.0-292 (HDP 2.6.5)


> Concurrent partition creation requests cause underlying HDFS folder to be 
> deleted
> -
>
> Key: HIVE-23437
> URL: https://issues.apache.org/jira/browse/HIVE-23437
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Affects Versions: 2.1.0, 3.1.2
> Environment: * Hive version: 1.2.1000.2.6.5.0-292 (HDP 2.6.5)
>  * HDFS version: 2.7.3.2.6.5.0-292 (HDP 2.6.5)
>  * Metastore DB: PostgreSQL 9.4
>Reporter: Marc Demierre
>Priority: Major
>
> There seems to be a race condition in Hive Metasore when issuing several 
> concurrent partition creation requests for the same new partition.
> In our case, this triggered due to Kafka Connect Hive integration, which 
> fires simultaneous partition creation requests from all its tasks when 
> syncing to Hive.
> We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
> the same in 3.1.2 (latest Hive release).
> Our investigation pointed to the following code (here in Hive 2.1.0, the base 
> for HDP 2.6.5):
> [https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]
> Same code in 3.1.2:
> https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202
> The generic scenario is the following:
>  # T1 (time period 1):
>  ** R1 (request 1) creates the HDFS dir
>  ** R2 also tries creating the HDFS dir
>  ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
> interverted)
>  # T2:
>  ** R1 creates the partition in metastore DB, all OK
>  # T3:
>  ** R2 tries to create partition in metastore DB, gets exception from DB 
> because it exists. Rollback transaction.
>  ** R2 thinks it created the directory (in fact they both did we do not know 
> who), so it removes it
>  # T4: State is invalid:
>  ## Partition exists
>  ## HDFS folder does not exist
>  ## Some Hive/Spark queries fail when trying to use the folder
> Here are some logs of the issue happening on our cluster in a standalone 
> metastore (R1 = thread 2303, R2 = thread 2302):
> {code:java}
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
> db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
> ket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
> (HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
> db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 part=time=ingestion/buc
> ket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
> (HiveMetaStore.java:logAuditEvent(319)) - 
> ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
> cmd=append_partition_by_name
> : db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
> part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
> (HiveMetaStore.java:logAuditEvent(319)) - 
> ugi=sbd-fh-connect-autodumph...@sbd.corproot.net  ip=10.222.76.1  
> cmd=append_partition_by_name
> : db=sbd_dev_autodump tbl=sbd_dev_debug_hive_4 
> part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
> 2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
> (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
> /date=2020-05-11/hour=11
> 2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
> (FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly
> /date=2020-05-11/hour=11
> 2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
> metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
> deleting  
> hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
> e=2020-05-11/hour=11
> 2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
> (TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
> 'hdfs://sbd/data/sbd/dev/sbd.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
>  to trash at: 
> hdfs://sbd/user/sbd-fh-connect-autodumphdfs/.Trash/Curr

[jira] [Updated] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2021-08-24 Thread Marc Demierre (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Demierre updated HIVE-23437:
-
Description: 
There seems to be a race condition in Hive Metasore when issuing several 
concurrent partition creation requests for the same new partition.

In our case, this triggered due to Kafka Connect Hive integration, which fires 
simultaneous partition creation requests from all its tasks when syncing to 
Hive.

We are running HDP 2.6.5 but a quick survey of the upstream code still shows 
the same in 3.1.2 (latest Hive release).

Our investigation pointed to the following code (here in Hive 2.1.0, the base 
for HDP 2.6.5):

[https://github.com/apache/hive/blob/rel/release-2.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2127]

Same code in 3.1.2:

https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3202

The generic scenario is the following:
 # T1 (time period 1):
 ** R1 (request 1) creates the HDFS dir
 ** R2 also tries creating the HDFS dir
 ** Both succeed (as if it already exists it succeeds, R1/R2 could be 
interverted)
 # T2:
 ** R1 creates the partition in metastore DB, all OK
 # T3:
 ** R2 tries to create partition in metastore DB, gets exception from DB 
because it exists. Rollback transaction.
 ** R2 thinks it created the directory (in fact they both did we do not know 
who), so it removes it
 # T4: State is invalid:
 ## Partition exists
 ## HDFS folder does not exist
 ## Some Hive/Spark queries fail when trying to use the folder

Here are some logs of the issue happening on our cluster in a standalone 
metastore (R1 = thread 2303, R2 = thread 2302):

{code:none}
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2303: append_partition_by_name: 
db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2302: append_partition_by_name: 
db=myproj_autodump tbl=myproj_debug_hive_4 part=time=ingestion/buc
ket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2303]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydoman.net   
   ip=10.222.76.1  cmd=append_partition_by_name
: db=myproj_autodump tbl=myproj_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:46,379 INFO  [pool-7-thread-2302]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydoman.net   
   ip=10.222.76.1  cmd=append_partition_by_name
: db=myproj_autodump tbl=myproj_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,953 INFO  [pool-7-thread-2302]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,957 INFO  [pool-7-thread-2303]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly
/date=2020-05-11/hour=11
2020-05-11 13:43:47,986 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/dat
e=2020-05-11/hour=11
2020-05-11 13:43:47,992 INFO  [pool-7-thread-2302]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11'
 to trash at: 
hdfs://platfrom/user/kafka-dump/.Trash/Current/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 INFO  [pool-7-thread-2302]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=11
2020-05-11 13:43:47,993 ERROR [pool-7-thread-2302]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@548a5b6c" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
  Detail: Key ("PART_NAME", 
"TBL_ID")=(time=ingestion/bucket=hourly/date=2020-05-11/hour=11, 6015512) 
already exists.
at 
org.datanucleu

[jira] [Comment Edited] (HIVE-23437) Concurrent partition creation requests cause underlying HDFS folder to be deleted

2021-08-24 Thread Marc Demierre (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105218#comment-17105218
 ] 

Marc Demierre edited comment on HIVE-23437 at 8/24/21, 8:54 AM:


We tried a workaround on the client side to ensure the calls are not 
simultaneous by delaying them. It didn't solve the issue, only made it rarer. 
We also observed a second instance of the problem which is slightly different:
 * T1:
 ** R1 creates the directory, then is paused/waiting
 * T2:
 ** R2 arrives, does not create the directory as it exists
 ** R2 creates the partition (wins the race on DB) and completes
 * T3:
 ** R1 resumes, sees that it failed the DB transaction, deletes the folder

Relevant logs (R1=2558, R2=2556):
{code:java}
 2020-05-11 20:00:00,944 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:00,945 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net  
ip=10.222.76.2  cmd=append_partition_by_name: db=myproj_dev_autodump 
tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2556: append_partition_by_name: 
db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,311 INFO  [pool-7-thread-2556]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net  
ip=10.222.76.2  cmd=append_partition_by_name: db=myproj_dev_autodump 
tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,481 INFO  [pool-7-thread-2558]: common.FileUtils 
(FileUtils.java:mkdir(573)) - Creating directory if it doesn't exist: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,521 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(352)) - Updating partition stats 
fast for: myproj_dev_debug_hive_4
2020-05-11 20:00:01,537 WARN  [pool-7-thread-2556]: hive.log 
(MetaStoreUtils.java:updatePartitionStatsFast(355)) - Updated size to 0
2020-05-11 20:00:01,764 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(41)) - 
deleting  
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: fs.TrashPolicyDefault 
(TrashPolicyDefault.java:moveToTrash(168)) - Moved: 
'hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18'
 to trash at: 
hdfs://platform/user/kafka-dump/.Trash/Current/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,787 INFO  [pool-7-thread-2558]: 
metastore.hivemetastoressimpl (HiveMetaStoreFsImpl.java:deleteDir(48)) - Moved 
to trash: 
hdfs://platform/data/myproj/dev/myproj.dev.debug-hive-4/time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:01,788 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(217)) - 
Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: 
javax.jdo.JDODataStoreException: Insert of object 
"org.apache.hadoop.hive.metastore.model.MPartition@3254e57d" using statement 
"INSERT INTO "PARTITIONS" 
("PART_ID","CREATE_TIME","LAST_ACCESS_TIME","PART_NAME","SD_ID","TBL_ID") 
VALUES (?,?,?,?,?,?)" failed : ERROR: duplicate key value violates unique 
constraint "UNIQUEPARTITION"
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(775)) - 2558: append_partition_by_name: 
db=myproj_dev_autodump tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,788 INFO  [pool-7-thread-2558]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(319)) - ugi=kafka-d...@platform.mydomain.net  
ip=10.222.76.2  cmd=append_partition_by_name: db=myproj_dev_autodump 
tbl=myproj_dev_debug_hive_4 
part=time=ingestion/bucket=hourly/date=2020-05-11/hour=18
2020-05-11 20:00:03,869 ERROR [pool-7-thread-2558]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(203)) - 
AlreadyExistsException(message:Partition already 
exists:Partition(values:[ingestion, hourly, 2020-05-11, 18], 
dbName:myproj_dev_autodump, tableName:myproj_dev_debug_hive_4, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, 
type:string, comment:null), FieldSchema(name:age, type:i