Jian Feng created HUDI-4066:
-------------------------------

             Summary: HiveMetastoreBasedLockProvider can not release lock when 
writer fails
                 Key: HUDI-4066
                 URL: https://issues.apache.org/jira/browse/HUDI-4066
             Project: Apache Hudi
          Issue Type: Bug
          Components: core
    Affects Versions: 0.10.1
            Reporter: Jian Feng


we use HiveMetastoreBasedLockProvider in the Prod environment, one writer is 
ingesting data with Flink, and another writer will delete some old partitions 
with Spark. sometimes spark job failed, but the lock was not released. then all 
writers failed.  
{code:java}
// error log
22/04/01 08:12:18 INFO TransactionManager: Transaction starting without a 
transaction owner22/04/01 08:12:18 INFO LockManager: LockProvider 
org.apache.hudi.hive.HiveMetastoreBasedLockProvider22/04/01 08:12:19 INFO 
metastore: Trying to connect to metastore with URI 
thrift://10.128.152.245:908322/04/01 08:12:19 INFO metastore: Opened a 
connection to metastore, current connections: 122/04/01 08:12:19 INFO 
metastore: Connected to metastore.22/04/01 08:12:20 INFO 
HiveMetastoreBasedLockProvider: ACQUIRING lock at database dev_video and table 
dwd_traffic_log22/04/01 08:12:25 INFO TransactionManager: Transaction ending 
without a transaction owner22/04/01 08:12:25 INFO 
HiveMetastoreBasedLockProvider: RELEASING lock at database dev_video and table 
dwd_traffic_log22/04/01 08:12:25 INFO TransactionManager: Transaction ended 
without a transaction ownerException in thread "main" 
org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock 
object     at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:71)   
 at 
org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:51)
    at 
org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:430)
    at 
org.apache.hudi.client.SparkRDDWriteClient.deletePartitions(SparkRDDWriteClient.java:261)
    at 
org.apache.hudi.DataSourceUtils.doDeletePartitionsOperation(DataSourceUtils.java:234)
    at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217)    
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)    at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
    at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
    at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
    at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
    at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)    
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
    at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)   
 at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:991)
    at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
    at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
    at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)    
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:991)    
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)  
  at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)    
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)    at 
com.shopee.ci.hudi.tasks.ExpiredPartitionDelete$.$anonfun$main$2(ExpiredPartitionDelete.scala:82)
    at 
com.shopee.ci.hudi.tasks.ExpiredPartitionDelete$.$anonfun$main$2$adapted(ExpiredPartitionDelete.scala:65)
    at scala.collection.Iterator.foreach(Iterator.scala:941)    at 
scala.collection.Iterator.foreach$(Iterator.scala:941)    at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1429)    at 
scala.collection.IterableLike.foreach(IterableLike.scala:74)    at 
scala.collection.IterableLike.foreach$(IterableLike.scala:73)    at 
scala.collection.AbstractIterable.foreach(Iterable.scala:56)    at 
com.shopee.ci.hudi.tasks.ExpiredPartitionDelete$.$anonfun$main$1(ExpiredPartitionDelete.scala:65)
    at 
com.shopee.ci.hudi.tasks.ExpiredPartitionDelete$.$anonfun$main$1$adapted(ExpiredPartitionDelete.scala:61)
    at scala.collection.Iterator.foreach(Iterator.scala:941)    at 
scala.collection.Iterator.foreach$(Iterator.scala:941)    at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1429)    at 
scala.collection.IterableLike.foreach(IterableLike.scala:74)    at 
scala.collection.IterableLike.foreach$(IterableLike.scala:73)    at 
scala.collection.AbstractIterable.foreach(Iterable.scala:56)    at 
com.shopee.ci.hudi.tasks.ExpiredPartitionDelete$.main(ExpiredPartitionDelete.scala:61)
    at 
com.shopee.ci.hudi.tasks.ExpiredPartitionDelete.main(ExpiredPartitionDelete.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)    at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)    
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)   
 at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)    at 
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)    at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)    
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)    at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
org.apache.hudi.exception.HoodieLockException: FAILED_TO_ACQUIRE lock at 
database dev_video and table dwd_traffic_log    at 
org.apache.hudi.hive.HiveMetastoreBasedLockProvider.tryLock(HiveMetastoreBasedLockProvider.java:114)
    at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:62)   
 ... 57 moreCaused by: java.util.concurrent.ExecutionException: 
org.apache.thrift.TApplicationException: Internal error processing lock    at 
java.util.concurrent.FutureTask.report(FutureTask.java:122)    at 
java.util.concurrent.FutureTask.get(FutureTask.java:206)    at 
org.apache.hudi.hive.HiveMetastoreBasedLockProvider.acquireLockInternal(HiveMetastoreBasedLockProvider.java:185)
    at 
org.apache.hudi.hive.HiveMetastoreBasedLockProvider.acquireLock(HiveMetastoreBasedLockProvider.java:139)
    at 
org.apache.hudi.hive.HiveMetastoreBasedLockProvider.tryLock(HiveMetastoreBasedLockProvider.java:112)
    ... 58 moreCaused by: org.apache.thrift.TApplicationException: Internal 
error processing lock    at 
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:4743)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:4730)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2174)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
    at com.sun.proxy.$Proxy45.lock(Unknown Source)    at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2348)
    at com.sun.proxy.$Proxy45.lock(Unknown Source)    at 
org.apache.hudi.hive.HiveMetastoreBasedLockProvider.lambda$acquireLockInternal$0(HiveMetastoreBasedLockProvider.java:184)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to